On Nov 3, 2011, at 1:36 PM, Blosch, Edwin L wrote: > Yes it sucks, so that's what led me to post my original question: If /dev/shm > isn't the right place to put the session file, and /tmp is NFS-mounted, then > what IS the "right" way to set up a diskless cluster? I don't think the idea > of tempfs sounds very appealing, after reading the discussion in FAQ #8 about > shared-memory usage. We definitely have a job-queueing system and jobs are > very often killed using qdel, and writing a post-script handler is way beyond > the level of involvement or expertise we can expect from our sys admins.
In the upcoming OMPI v1.7, we revamped the shared memory setup code such that it'll actually use /dev/shm properly, or use some other mechanism other than a mmap file backed in a real filesystem. So the issue goes away. But it doesn't help you yet. :-\ > Surely there's some reasonable guidance that can be offered to work around an > issue that is so disabling. Other than the shared memory file, the session directory shouldn't be large. So keeping it in a tmpfs should be ok. It's just that putting the shared memory in a tmpfs has the potential to cost you "twice": the actual shared memory itself, and then taking up space in tmpfs (although I have not verified this myself -- perhaps Linux is smart enough to not do this?). Are there *no* local disk on the machines at all? > A related question would be: How is it that HP-MPI works just fine on this > cluster as it is configured now? Are they doing something different for > shared memory communications? They're probably either not warning you about the issue or not using mmaped files that are backed in a filesystem (warning you about the issue is actually a relatively new feature in OMPI, IIRC -- since 1.0, IIRC, OMPI has used mmap files in a filesystem). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/