Hi, In light of some of the questions about a suitable computer to run siesta on, I have put together some bench-marks.
Serial siesta: Different computers will handle each routine within siesta with a different efficiency. It is therefore very difficult to say how fast a computer will be without first deciding upon the type of atomic system one intends to run. To illustrate this, I have run a series of different sized GaAs cells on Itanium2 (1.4GHz 3MB L3 Altix 350) and xeon (2.8GHz 512KB) processors: n atoms t(Altix) t(xeon) ratio %diagon seconds seconds 4 45.52 45.28 0.99 2.06 8 52.24 58.68 1.12 6.11 16 79.88 139.4 1.75 18.58 24 138.02 314.85 2.28 31.48 36 342.99 911.43 2.66 58.88 54 1112.14 3378.53 3.04 79.66 72 2613.03 7943.96 3.04 88.32 96 4188.65 13227.00 3.16 89.79 For further serial benchmarks, I have chosen a GaMnAs cell with 32 atoms 8*8*8 K-grid, 350 Ry cutoff, DZ basis for S and P orbitals for each atom and a TZ basis for the d orbital of Mn. This cell was run for 40 SCF iterations, on different platforms. Computer job time(hours) 1) Pentium III, 1.0 GHz, Cache L2 256KB 94.5 2) Xeon 3.2GHz 512KB Cache 18.0 3) SGI Alitx 350 1.4GHz Itanium2 1.5MB L3 7.12 4) SGI Alitx 350 1.4GHz Itanium2 3.0MB L3 4.12 5) IBM power4 16.8 6) sunfire AMD optron 2.2 GHz 1MB L2 21.13 I can't guarantee that I have fully optimised the executables particularly for the AMD optron for which I was only able to make a 32bit binary. This bench mark flatters the Itanium2 computer due to the large cache, but for smaller atomic systems there is less of a difference between the Itanium and xeon processors. Parallel siesta: The speed of inter node communications needed to run in parallel again depends largely on the atomic system. For the GaMnAs system to run efficiently using the "parallel over orbitals" method requires very fast communications i.e. myrinet and faster. Generally the CPU time decays exponentially against the number of processors, I therefore give the gradient of the log log plot as a measure of communication performance: [d log(time)] / [d log(number of processor)]. computer gradient of logs Gigabit & Xeon 3.2GHz +0.58 (note +ve) Myrinet & Pentium III -0.44 IBM power 3 & Shared memory -0.66 NUMA & Itanium2 -0.43 The GaMnAs system does not perform well when parallelised over orbitals, and for a majority of other atomic systems better scaling is observed with the number of processors. The GaMnAs cell is (I think) a worst case scenario for parallelisation, and would do much better parallelised over K. I hope this helps. Tom ============================================================== * * * Thomas Archer * * * * Computational Spintronics Group * * Physics Department * * Trinity College * * Dublin 2 Phone: +353 1 6083262 * * Ireland Mobile: +353 85 7157772 * * Fax: +353 1 6711759 * * Email: [EMAIL PROTECTED] * * * ==============================================================