Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-18 Thread Peter Blaha
As was mentioned before, such a big case needs mpi in order to run efficiently. As a "quick" small improvement set the OMP_NUM_THREAD variable to 2 or 4. This should give a speedup of about 2 and in the dayfile you should see that not 905% of the cpu was used, but 180% or so. On 10/18

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-18 Thread Yundi Quan
First, thank Peter. I should have described my problem thoroughly. :RKM : MATRIX SIZE 9190LOs:1944 RKM= 4.88 WEIGHT= 2.00 PGR The reduced RKM is 4.88. The reduced matrix size is 9190 which is about 2/5 of the full matrix. So that explains a lot. I'm using P1 symmetry. Therefore, the complex

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Peter Blaha
You still did not tell us the matrix size for the truncated RKmax, but yes, the scaling is probably ok. (scaling goes with n^3; i.e. in case of of matrix size 12000 and 24000 we expect almost a factor of 8 !!! in cpu time. It also explaines the memory You also did not tell us if you have i

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Yundi Quan
Thanks a lot. On cluster A, RKM was automatically reduced to 4.88 while on cluster B RKM was kept at 7. I didn't expect this, though I was aware that WIEN2k would automatically reduce RKM in some cases. But is it reasonable for an iteration to run for eight hours with the following parameters? Mini

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Peter Blaha
The Xeon X5550 processor is a 4 core processor and your cluster may have combined a few of them on one node (2-4 ?) Anyway, 14 cores are not really possible ?? Have you done more than just looking on the total time ? Is the machines file the same on both clusters ? Such a machines file does N

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Laurence Marks
Something is not right. I think I misread your dayfile and in fast mkl threading is not active. Try something like env | grep -e MKL . I suspect that your job is just running on a single core. On Thu, Oct 17, 2013 at 10:13 AM, Yundi Quan wrote: > Sorry that I didn't make it clear. The dayfile wa

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Yundi Quan
Sorry that I didn't make it clear. The dayfile was for cluster B. As I said before, I always request one core per node and 8 nodes per job (number of k points). I have 72 crystallographically non-equivalent atoms. On cluster B, I used the following R_LIB (LAPACK+BLAS) option to compile WIEN2k. -l

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Laurence Marks
I assume the dayfile was for cluster A, as wall is about 8x cpu which is about right for mkl multithreading which you are presumably using. You are not using mpi. You may want to compare the wall time to using on cluster A 1:node1:8 depending upon many factors it may be faster, or slower. This is

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Yundi Quan
Thanks for your reply. a). both machines are set up in a way that once a node is assigned to a job, it cannot be assigned to another. b). The .machines file looks like this 1:node1 1:node2 1:node3 1:node4 1:node5 1:node6 1:node7 1:node8 granularity:1 extrafine:1 lapw2_vector_split:1 I've been tryi

Re: [Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Laurence Marks
There are so many possibilities, a few: a) If you only request 1 core/node most queuing systems (qsub/msub etc) will allocate the other cores to other jobs. You are then going to be very dependent upon what those other jobs are doing. Normal is to use all the cores on a given node. b) When you ru

[Wien] Intel(R) Xeon(R) CPU X5550 @ 2.67GHz vs Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

2013-10-17 Thread Yundi Quan
Hi, I have access to two clusters as a low-level user. One cluster (cluster A) consists of nodes with 8 core and 8 G mem per node. The other cluster (cluster B) has 24G mem per node and each node has 14 cores or more. The cores on cluster A are Xeon CPU E5620@2.40GHz, while the cores on cluster B a