Re: [OMPI users] memory per core/process

Duke Nguyen Tue, 2 Apr 2013 11:40:26 -0400

On 3/30/13 8:46 PM, Patrick Bégou wrote:

Ok, so your problem is identified as a stack size problem. I went intothese limitations using Intel fortran compilers on large data problems.
First, it seems you can increase your stack size as "ulimit -sunlimited" works (you didn't enforce the system hard limit). The bestway is to set this setting in your .bashrc file so it will works onevery node.But setting it to unlimited may not be really safe. IE, if you got ina badly coded recursive function calling itself without a stopcondition you can request all the system memory and crash the node. Soset a large but limited value, it's safer.

Now I feel the pain you mentioned :). With -s unlimited now some of ournodes are easily down (completely) and needed to be hard reset!!!(whereas we never had any node down like that before even with thekilled or badly coded jobs).


Looking for a safer number of ulimit -s other than "unlimited" now... :(

I'm managing a cluster and I always set a maximum value to stack size.I also limit the memory available for each core for system stability.If a user request only one of the 12 cores of a node he can onlyaccess 1/12 of the node memory amount. If he needs more memory he hasto request 2 cores, even if he uses a sequential code. This avoidcrashing jobs of other users on the same node with memoryrequirements. But this is not configured on your node.
Duke Nguyen a écrit :
On 3/30/13 3:13 PM, Patrick Bégou wrote:
I do not know about your code but:
1) did you check stack limitations ? Typically intel fortran codesneeds large amount of stack when the problem size increase.
Check ulimit -a
First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127368
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
So stack size is 10MB??? Does this one create problem? How do Ichange this?
2) did your node uses cpuset and memory limitation like fake numa toset the maximum amount of memory available for a job ?
Not really understand (also first time heard of fake numa), but I ampretty sure we do not have such things. The server I tried was adedicated server with 2 x5420 and 16GB physical memory.
Patrick

Duke Nguyen a écrit :
Hi folks,
I am sorry if this question had been asked before, but after tendays of searching/working on the system, I surrender :(. We try touse mpirun to run abinit (abinit.org) which in turns will call aninput file to run some simulation. The command to run is pretty simple
$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
We ran this command on a server with two quad core x5420 and 16GBof memory. I called only 4 core, and I guess in theory each of thecore should take up to 2GB each.
In the output of the log, there is something about memory:
P This job should need less than 717.175 Mbytesof memory.
  Rough estimation (10% accuracy) of disk space for files :
WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240Mbytes.
So basically it reported that the above job should not take morethan 718MB each core.
But I still have the Segmentation Fault error:
mpirun noticed that process rank 0 with PID 16099 on node biobosexited on signal 11 (Segmentation fault).
The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.
If we adjust the parameters of the input.files to give the reportedmem per core is less than 512MB, then the job runs fine.
Please help,

Thanks,

D.


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] memory per core/process

Reply via email to