As was mentioned before, such a big case needs mpi in order to run
efficiently.
As a "quick" small improvement set the OMP_NUM_THREAD variable to 2 or
4. This should give a speedup of about 2 and in the dayfile you should
see that not 905% of the cpu was used, but 180% or so.
On 10/18
First, thank Peter. I should have described my problem thoroughly.
:RKM : MATRIX SIZE 9190LOs:1944 RKM= 4.88 WEIGHT= 2.00 PGR
The reduced RKM is 4.88. The reduced matrix size is 9190 which is about 2/5 of
the full matrix. So that explains a lot. I'm using P1 symmetry. Therefore, the
complex
You still did not tell us the matrix size for the truncated RKmax, but yes,
the scaling is probably ok. (scaling goes with n^3; i.e. in case of of
matrix size 12000 and 24000 we expect almost a factor of 8 !!! in cpu time.
It also explaines the memory
You also did not tell us if you have i
Thanks a lot.
On cluster A, RKM was automatically reduced to 4.88 while on cluster B RKM
was kept at 7. I didn't expect this, though I was aware that WIEN2k would
automatically reduce RKM in some cases. But is it reasonable for an
iteration to run for eight hours with the following parameters?
Mini
The Xeon X5550 processor is a 4 core processor and your cluster may have
combined a few of them on one node (2-4 ?) Anyway, 14 cores are not
really possible ??
Have you done more than just looking on the total time ?
Is the machines file the same on both clusters ? Such a machines file
does N
Something is not right. I think I misread your dayfile and in fast mkl
threading is not active. Try something like env | grep -e MKL . I
suspect that your job is just running on a single core.
On Thu, Oct 17, 2013 at 10:13 AM, Yundi Quan wrote:
> Sorry that I didn't make it clear. The dayfile wa
Sorry that I didn't make it clear. The dayfile was for cluster B. As I said
before, I always request one core per node and 8 nodes per job (number of k
points). I have 72 crystallographically non-equivalent atoms.
On cluster B, I used the following R_LIB (LAPACK+BLAS) option to compile
WIEN2k. -l
I assume the dayfile was for cluster A, as wall is about 8x cpu which
is about right for mkl multithreading which you are presumably using.
You are not using mpi. You may want to compare the wall time to using
on cluster A
1:node1:8
depending upon many factors it may be faster, or slower. This is
Thanks for your reply.
a). both machines are set up in a way that once a node is assigned to a
job, it cannot be assigned to another.
b). The .machines file looks like this
1:node1
1:node2
1:node3
1:node4
1:node5
1:node6
1:node7
1:node8
granularity:1
extrafine:1
lapw2_vector_split:1
I've been tryi
There are so many possibilities, a few:
a) If you only request 1 core/node most queuing systems (qsub/msub
etc) will allocate the other cores to other jobs. You are then going
to be very dependent upon what those other jobs are doing. Normal is
to use all the cores on a given node.
b) When you ru
Hi,
I have access to two clusters as a low-level user. One cluster (cluster A)
consists of nodes with 8 core and 8 G mem per node. The other cluster
(cluster B) has 24G mem per node and each node has 14 cores or more. The
cores on cluster A are Xeon CPU E5620@2.40GHz, while the cores on cluster B
a
11 matches
Mail list logo