Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Juergen Salk
* Loris Bennett [190917 07:46]: > > > >>But I still don't get the point. Why should I favour `srun > >>./my_mpi_program´ > >>over `mpirun ./my_mpi_program´? For me, both seem to do exactly the same > >>thing. No? Did I miss something? > > > >>Best regards > >>Jürgen > > > > Running a single job

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Stijn De Weirdt
hi jurgen, > For our next cluster we will switch from Moab/Torque to Slurm and have > to adapt the documentation and example batch scripts for the users. heh, we did that a year ago, and we made (well, fixed the slurm one) a qsub wrapper to avoid having to document this and retraining our users. (

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Philip Kovacs
>For our next cluster we will switch from Moab/Torque to Slurm and have >to adapt the documentation and example batch scripts for the users. >Therefore, I wonder if and why we should recommend (or maybe even urge) >our users to use srun instead of mpirun/mpiexec in their batch scripts >for MPI j

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Marcus Wagner
Hi Jürgen, we set in our modules the variables $MPIEXEC and $FLAGS_MPI_BATCH and documented these. This way, by changing the workloadmanagement system or the MPI or whatsoever does not change the documentation (at least on that point ;) ) Best Marcus On 9/17/19 9:02 AM, Juergen Salk wrote:

[slurm-users] Slurm: Insane length message

2019-09-17 Thread BADREDDINE Alaa
I have the latest version of slurm (`slurm 20.02.0-0pre1`) installed on 4 mini debian test servers: `test1` = `slurm master` `test2` = `slurm backup` `test3` = `slurmdbd` `test4` = `slurm slave` I am trying to make it work and I set up all my configuration files. The logs don't seem to menti

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Paul Edmon
For my two-cents I would recommend using srun. While mpirun "works" I've seen strange behavior especially if you are using task affinity and core binding.  Even weirder with hybrid codes that use threads and MPI.  Using srun resolves these issues as it integrates more tightly with the scheduler

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Mark Hahn
over `mpirun ./my_mpi_program´? For me, both seem to do exactly the same thing. No? Did I miss something? no, the issue is whether your mpirun is slurm-aware or not. you can get exactly the same behavior, if you link with slurm hooks. the main thing is that slurm communicates the resources for

[slurm-users] Maxjobs not being enforced

2019-09-17 Thread Tina Fora
Hello Slurm user, We have 'AccountingStorageEnforce=limits,qos' set in our slurm.conf. I've added maxjobs=100 for a particular user causing havoc on our shared storage. This setting is still not being enforced and the user is able to launch 1000s of jobs. I also ran 'scontrol reconfig' and even r

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Juergen Salk
* Philip Kovacs [190917 07:43]: > >> I suspect the question, which I also have, is more like: > >> > >>  "What difference does it make whether I use 'srun' or 'mpirun' within > >>    a batch file started with 'sbatch'." > > One big thing would be that using srun gives you resource tracking > an

Re: [slurm-users] Maxjobs not being enforced

2019-09-17 Thread David Rhey
Hi, Tina, Could you send the command you ran? David On Tue, Sep 17, 2019 at 2:06 PM Tina Fora wrote: > Hello Slurm user, > > We have 'AccountingStorageEnforce=limits,qos' set in our slurm.conf. I've > added maxjobs=100 for a particular user causing havoc on our shared > storage. This setting i

Re: [slurm-users] Maxjobs not being enforced

2019-09-17 Thread Tina Fora
# sacctmgr modify user lif6 set maxjobs=100 # sacctmgr list assoc user=lif6 format=user,maxjobs,maxsubmit,maxtresmins User MaxJobs MaxSubmit MaxTRESMins -- --- - - lif6 100 > Hi, Tina, > > Could you send the command you ran? > > David > > On Tue, S