Dear Support,We are having an issue with our OMPI runs. When we run jobs on <=550 machines (550 x 16 cores) then they work without any problem. As soon as we run them on 600 or more machines we get the "plm:tm: failed to spawn daemon, error code = 17000" Error
We are using: OpenMPI ver: 1.6.4 (Compiled with GCC v4.4.6) Torque ver: 2.5.12 The ompi_info's output is attached. The Environmentstats have been pasted below. Please assist. env envsubst [ocfacc@cyan01 fullrun]$ env MODULE_VERSION_STACK=3.2.10 OMPI_MCA_mtl=^psm MANPATH=/local/software/openmpi/1.6.4/gcc/share/man:/local/software/moab/6.1.10/man:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man:/local/Modules/default/share/man HOSTNAME=cyan01 SHELL=/bin/bash TERM=xterm HISTSIZE=1000 QTDIR=/usr/lib64/qt-3.3 OLDPWD=/home/ocfacc/hpl/fullrun/results QTINC=/usr/lib64/qt-3.3/include LC_ALL=POSIX USER=ocfacc LD_LIBRARY_PATH=/local/software/openmpi/1.6.4/gcc/lib:/local/software/torque/default/lib LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36: MPIROOT=/local/software/openmpi/1.6.4/gcc MODULE_VERSION=3.2.10 MAIL=/var/spool/mail/ocfacc PATH=/local/software/openmpi/1.6.4/gcc/bin:/usr/lib64/qt-3.3/bin:/local/software/moab/6.1.10/sbin:/local/software/moab/6.1.10/bin:/local/software/torque/default/sbin:/local/software/torque/default/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/lpp/mmfs/bin:/home/ocfacc/bin:/local/bin:. PWD=/home/ocfacc/hpl/fullrun _LMFILES_=/local/Modules/3.2.10/modulefiles/schedulers/torque/2.5.12:/local/Modules/3.2.10/modulefiles/schedulers/moab/6.1.10:/local/Modules/3.2.10/modulefiles/misc/null:/local/Modules/3.2.10/modulefiles/mpi/openmpi/1.6.4/gcc LANG=en_US.UTF-8 KDE_IS_PRELINKED=1 MOABHOMEDIR=/local/moab/6.1.10 MODULEPATH=/local/Modules/versions:/local/Modules/modulefiles:/local/Modules/3.2.10/modulefiles/misc:/local/Modules/3.2.10/modulefiles/mpi:/local/Modules/3.2.10/modulefiles/libs:/local/Modules/3.2.10/modulefiles/compilers:/local/Modules/3.2.10/modulefiles/apps:/local/Modules/3.2.10/modulefiles/schedulers LOADEDMODULES=torque/2.5.12:moab/6.1.10:null:openmpi/1.6.4/gcc KDEDIRS=/usr PBS_SERVER=blue101,blue102 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass HISTCONTROL=ignoredups SHLVL=1 HOME=/home/ocfacc LOGNAME=ocfacc QTLIB=/usr/lib64/qt-3.3/lib CVS_RSH=ssh LC_CTYPE=POSIX MODULESHOME=/local/Modules/3.2.10 LESSOPEN=|/usr/bin/lesspipe.sh %s G_BROKEN_FILENAMES=1 module=() { eval `/local/Modules/$MODULE_VERSION/bin/modulecmd bash $*` } _=/bin/env -- Qamar Nazir Best Regards, *Qamar Nazir* HPC Software Engineer OCF plc *Tel:*0114 257 2200 Twitter <http://twitter.com/ocfplc> *Fax:*0114 257 0022 Blog <http://blog.ocf.co.uk/> *Mob:*07508 033895 Web <http://www.ocf.co.uk/>OCF plc is a company registered in England and Wales. Registered number 4132533. Registered office address: OCF plc, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG
Please note, any emails relating to an OCF Support request must always be sent to supp...@ocf.co.uk <mailto:supp...@ocf.co.uk>for a ticket number to be generated or existing support ticket to be updated. Should this not be done then OCF cannot be held responsible for requests not dealt with in a timely manner.
This message is private and confidential. If you have received this message in error, please notify us immediately and remove it from your system.
ompi_info.txt.bz2
Description: application/bzip