By John M. Levesque
Contents: advent; Supercomputer structure; Fortran; Vectorization of Fortran courses. Index. This booklet explains intimately either the underlying structure of brand new supercomputers and the way through which a compiler maps Fortran code onto that structure. most vital, the constructs combating complete optimizations are defined, and particular ideas for restructuring a software are supplied
Read or Download A Guidebook to Fortran on Supercomputers PDF
Best software books
This quantity constitutes the refereed lawsuits of the 18th EuroSPI convention, held in Roskilde, Denmark, in June 2011. The 18 revised complete papers offered including nine key notes have been rigorously reviewed and chosen. they're equipped in topical sections on SPI and checks; SPI and implentation; SPI and development equipment; SPI association; SPI humans/ groups; SPI and reuse; chosen key notes for SPI implementation.
Those lawsuits comprise tutorials and papers offered on the 6th CSR Confer ence concerning huge software program platforms. the purpose of the convention used to be to spot ideas to the issues of constructing and keeping huge software program structures, in response to methods that are at present being undertaken by way of software program practitioners.
Extra resources for A Guidebook to Fortran on Supercomputers
EPSLON) THEN SDOT = SDOT + B(I) * C(I) ENDIF 2070 CONTINUE Ignoring numerical considerations as to the order in which computation is performed, we could imagine that each of four processors could be assigned to compute the dot product in the index ranges 1-250, 251-500, 501-750, and 751-1000. But notice that each processor would be asynchronously updating the variable SDOT. Conceptually, two processors could fetch the same value of SDOT, add their terms to it, and store it back. The first value stored would be overwritten by the second, and some terms in the sum would be lost.
A good rule of thumb on such machines is to run a loop in scalar mode if the number of indirect memory references exceeds the number of arithmetic operations. 2 THE VECTOR PROCESSOR 45 Now consider a loop that has only one indirect index used to subscript all of the loop arrays, a fairly common occurrence in sparse matrix calcula tions: DO 2190 1 = 1 , 64 A(JJ(I)) = B(JJ(I)) + C ( J J ( D ) 2190 CONTINUE Once the vector JJ(1:64) has been fetched to a vector register, it can be reused again and again to indirectly fetch and store the arrays of the compu tation; it need not be refetched for each of the other arrays as was the case with several different indirect indexes.
Fetch the next 64 elements of Y to a vector register (V4). • Generate register V5 by choosing elements from V3 where the corre sponding bit of VM is 1 and by choosing elements from V4 where the bits are zero. • Store V5 into the 64 elements of the array Y. Or to state it more simply, perform all the computation in vector mode for all elements, then store only those elements for which the condition is true. GE. 0 true for all odd values of I. Although all computation of all elements is performed, vector-mask computation is so fast that it will outperform scalar mode any time that the condition is true a significant percentage of the time.
A Guidebook to Fortran on Supercomputers by John M. Levesque