I'm in an airport right now and can't easily check, but instead of using mmap
memory (which treats shared memory as a file), you could tell open MPI to use
SYSV shared memory. IIRC that isn't treated like a file.
Look for a selection mechanism via an MCA param in the sm or Vader btls- run
stuff
> For what it's worth, that's open MPI creating a chunk of shared memory
for use with on-server
> communication. It shows up as a "file", but it's
really shared memory.
> You can disable sm and/or Vader, but your on-server message passing
> performance will be significantly
> lower.
> Is
Wouldn't be a bad idea to fail a little better, ya. Perhaps a good show-help
message.
Sent from my phone. No type good.
On Nov 20, 2015, at 5:52 AM, Gilles Gouaillardet
mailto:gilles.gouaillar...@gmail.com>> wrote:
Jeff,
should we check ulimit in vader/sm btl and disable them with a warning i
Jeff,
should we check ulimit in vader/sm btl and disable them with a warning if
value is too low ?
Cheers,
Gilles
On Friday, November 20, 2015, Jeff Squyres (jsquyres)
wrote:
> For what it's worth, that's open MPI creating a chunk of shared memory for
> use with on-server communication. It sh
For what it's worth, that's open MPI creating a chunk of shared memory for use
with on-server communication. It shows up as a "file", but it's really shared
memory.
You can disable sm and/or Vader, but your on-server message passing performance
will be significantly lower.
Is there a reason yo
I apologize, I have the wrong lines from strace for the initial file there (of
course). The file with fd = 11 which causes the problem is called
shared_mem_pool.[host] and fruncate(11, 134217736) is called on it. (This is
exactly 1024 times the ulimit of 131072 which makes sense as the ulimit is
> Could you please provide a little more info regarding the environment you
> are running under (which resource mgr or not, etc), how many nodes you had
> in the allocation, etc?
> There is no reason why something should behave that way. So it would help
> if we could understand the setup.
An "strace" showed something related to shared memory use was causing the
signal. Sticking
btl = ^sm
into the openmpi-mca-params.conf file fixed this issue.
saurabh
From: saur...@hotmail.com
To: us...@open-mpi.org
Subject: Openmpi 1.10.1 fails with SIGXFSZ on file limit <= 131072
List-Post: us