Can anyone guess what the problem is here? I was under the impression that OpenMPI (1.4.4) would look for /tmp and would create its shared-memory backing file there, i.e. if you don't set orte_tmpdir_base to anything.
Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use /dev/shm. Why? And, next question, why doesn't it work? Here are the oddities of this cluster: - the cluster is 'diskless' - /tmp is an NFS mount - /dev/shm is 12 GB and has 755 permissions Filesystem Size Used Avail Use% Mounted on tmpfs 12G 164K 12G 1% /dev/shm % ls -l output: drwxr-xr-x 2 root root 40 Oct 28 09:14 shm The error message below suggests that OpenMPI (1.4.4) has somehow auto-magically decided to use /dev/shm and is failing to be able to use it, for some reason. Thanks for whatever help you can offer, Ed e8315:02942] opal_os_dirpath_create: Error: Unable to create the sub-directory (/dev/shm/openmpi-sessions-estenfte@e8315_0) of (/dev/shm/openmpi-sessions-estenfte@e8315_0/8474/0/1), mkdir failed [1] [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 106 [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file util/session_dir.c at line 399 [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file base/ess_base_std_orted.c at line 206 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_session_dir failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file ess_env_module.c at line 136 [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [e8315:02942] [[8474,0],1] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 325