Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
I'm in an airport right now and can't easily check, but instead of using mmap memory (which treats shared memory as a file), you could tell open MPI to use SYSV shared memory. IIRC that isn't treated like a file. Look for a selection mechanism via an MCA param in the sm or Vader btls- run stuff

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Saurabh T
> For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server > communication. It shows up as a "file", but it's really shared memory. > You can disable sm and/or Vader, but your on-server message passing > performance will be significantly > lower. > Is

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
Wouldn't be a bad idea to fail a little better, ya. Perhaps a good show-help message. Sent from my phone. No type good. On Nov 20, 2015, at 5:52 AM, Gilles Gouaillardet mailto:gilles.gouaillar...@gmail.com>> wrote: Jeff, should we check ulimit in vader/sm btl and disable them with a warning i

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Gilles Gouaillardet
Jeff, should we check ulimit in vader/sm btl and disable them with a warning if value is too low ? Cheers, Gilles On Friday, November 20, 2015, Jeff Squyres (jsquyres) wrote: > For what it's worth, that's open MPI creating a chunk of shared memory for > use with on-server communication. It sh

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-20 Thread Jeff Squyres (jsquyres)
For what it's worth, that's open MPI creating a chunk of shared memory for use with on-server communication. It shows up as a "file", but it's really shared memory. You can disable sm and/or Vader, but your on-server message passing performance will be significantly lower. Is there a reason yo

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-19 Thread Saurabh T
I apologize, I have the wrong lines from strace for the initial file there (of course). The file with fd = 11 which causes the problem is called shared_mem_pool.[host] and fruncate(11, 134217736) is called on it. (This is exactly 1024 times the ulimit of 131072 which makes sense as the ulimit is

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-19 Thread Saurabh T
> Could you please provide a little more info regarding the environment you > are running under (which resource mgr or not, etc), how many nodes you had > in the allocation, etc? > There is no reason why something should behave that way. So it would help > if we could understand the setup.

Re: [OMPI users] Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072

2015-11-19 Thread Saurabh T
An "strace" showed something related to shared memory use was causing the signal. Sticking btl = ^sm into the openmpi-mca-params.conf file fixed this issue. saurabh From: saur...@hotmail.com To: us...@open-mpi.org Subject: Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072 List-Post: us