Hi Moe, > On Sep 21, 2015, at 10:02 PM, Moe Jette <[email protected]> wrote: > > > What version of Slurm?
We're currently running 14.11.7 > How many tasks/ranks in your job? I've been trying 500 nodes with 12 tasks per node, giving a total of 6000. Although after this failed I started fiddling with less (100 nodes and ramping up, 200, 300, 400 ..). It seems anything over 300 to touch and go. > Can you run a non-MPI job of the same size (i.e. srun hostname)? Not reliably. $ cat hostname.sh #!/bin/bash # #SBATCH --job-name=OSU_Int #SBATCH --qos=admin #SBATCH --time=00:15:00 #SBATCH --nodes=500 #SBATCH --ntasks-per-node=12 #SBATCH --account=crcbenchmark #SBATCH --output=/lustre/janus_scratch/tibr1099/hostname_%A.txt srun hostname $ sbatch hostname.sh Submitted batch job 976034 $ wc -l hostname_976034.txt 5992 hostname_976034.txt $ grep -v ^node hostname_976034.txt srun: error: Task launch for 976034.0 failed on node node0453: Socket timed out on send/recv operation srun: error: Application launch failed: Socket timed out on send/recv operation srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: Timed out waiting for job step to complete Any thoughts? Thanks Timothy > > Quoting Ralph Castain <[email protected]>: >> This sounds like something in Slurm - I don’t know how srun would know to >> emit a message if the app was failing to open a socket between its own procs. >> >> Try starting the OMPI job with “mpirun” instead of srun and see if it has >> the same issue. If not, then that’s pretty convincing that it’s slurm. >> >> >>> On Sep 21, 2015, at 7:26 PM, Timothy Brown <[email protected]> >>> wrote: >>> >>> >>> Hi Chris, >>> >>> >>>> On Sep 21, 2015, at 7:36 PM, Christopher Samuel <[email protected]> >>>> wrote: >>>> >>>> >>>> On 22/09/15 07:17, Timothy Brown wrote: >>>> >>>>> This is using mpiexec.hydra with slurm as the bootstrap. >>>> >>>> Have you tried Intel MPI's native PMI start up mode? >>>> >>>> You just need to set the environment variable I_MPI_PMI_LIBRARY to the >>>> path to the Slurm libpmi.so file and then you should be able to use srun >>>> to launch your job instead. >>>> >>> >>> Yeap, to the same effect. Here's what it gives: >>> >>> srun --mpi=pmi2 >>> /lustre/janus_scratch/tibr1099/osu_impi/libexec/osu-micro-benchmarks//mpi/collective/osu_alltoall >>> srun: error: Task launch for 973564.0 failed on node node0453: Socket timed >>> out on send/recv operation >>> srun: error: Application launch failed: Socket timed out on send/recv >>> operation >>> >>> >>> >>>> More here: >>>> >>>> http://slurm.schedmd.com/mpi_guide.html#intel_srun >>>> >>>>> If I switch to OpenMPI the error is: >>>> >>>> Which version, and was it build with --with-slurm and (if you're >>>> version is not too ancient) --with-pmi=/path/to/slurm/install ? >>> >>> Yeap. 1.8.5 (for 1.10 we're going to try and move everything to EasyBuild). >>> Yes we included PMI and the Slurm option. Our configure statement was: >>> >>> module purge >>> module load slurm/slurm >>> module load gcc/5.1.0 >>> ./configure \ >>> --prefix=/curc/tools/x86_64/rh6/software/openmpi/1.8.5/gcc/5.1.0 \ >>> --with-threads=posix \ >>> --enable-mpi-thread-multiple \ >>> --with-slurm \ >>> --with-pmi=/curc/slurm/slurm/current/ \ >>> --enable-static \ >>> --enable-wrapper-rpath \ >>> --enable-sensors \ >>> --enable-mpi-ext=all \ >>> --with-verbs >>> >>> It's got me scratching my head, as I started off thinking it was an MPI >>> issue, spent awhile getting Intel's hydra and OpenMPI's oob to go over IB >>> instead of gig-e. This increased the success rate, but we were still >>> failing. >>> >>> Tried out a pure PMI (version 1) code (init, rank, size, fini), which >>> worked a lot of the times. Which made me think it was MPI again! However >>> that fails enough to say it's not MPI. The PMI v2 code I wrote, gives the >>> wrong results for rank and world size, so I'm sweeping that under the rug >>> until I understand it! >>> >>> Just wondering if anybody has seen anything like this. Am happy to share >>> our conf file if that helps. >>> >>> The only other thing I could possibly point a finger at (but don't believe >>> it is), is that the slurm masters (slurmctld) are only on gig-E. >>> >>> I'm half thinking of opening a TT, but was hoping to get more information >>> (and possibly not increase the logging of slurm, which is my only next >>> idea). >>> >>> Thanks for your thoughts Chris. >>> >>> Timothy= > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC > Commercial Slurm Development and Support
