Unless I'm misunderstanding something, it sounds like you should be using jobs (sbatch) and job steps (srun) instead of job arrays (sbatch -a). The way I think of it, srun is like subletting a property that you're renting. Job arrays are for launching homogeneous work that differ just in an index number, though there are some other creative things you can do with them.

Ryan

On 01/27/2016 06:47 PM, Andrus, Brian Contractor wrote:
Re: [slurm-dev] Re: Update job and partition for shared jobs

I ended up just doing ‘scancel’ on all the jobs and resubmitting them.

I seem to be making progress.

Now I am having trouble figuring out the –distribution option.

I want to have it such that each node runs 1 of each array job, but shares the remaining resources for other jobs.

Here is what is in my script:

#SBATCH --nodes=1

#SBATCH --sockets-per-node=1

#SBATCH --cores-per-socket=5

#SBATCH --threads-per-core=2

#SBATCH --distribution=cyclic:block,NoPack

So I am getting 10 threads on a box. This is to run an mpi program.

I do a sbatch:

sbatch --array=1-100%2 slurm_array.sh

I would expect my job to have 1 running on node1 and 1 on node2, but both start on node1.

*From:*John Desantis [mailto:desan...@mail.usf.edu]
*Sent:* Wednesday, January 27, 2016 7:37 AM
*To:* slurm-dev <slurm-dev@schedmd.com>
*Subject:* [slurm-dev] Re: Update job and partition for shared jobs

Brian,

I've never run into that message with SLURM yet.

Have you tried releasing the jobs with scontrol, e.g. "scontrol release ID" where "ID" is the job number?

We do not automatically requeue jobs due to a bug (fixed!) which caused the controller to crash because of an empty task_id_bitmap.

John DeSantis

2016-01-26 20:05 GMT-05:00 Andrus, Brian Contractor <bdand...@nps.edu <mailto:bdand...@nps.edu>>:

    John,

    Thanks. That seemed to help; a job started on a node that had a
    job on it once the job that had been on it (‘using’ all the
    memory) completed.

    But now all my jobs won’t start and have a status of
    ‘JobHoldMaxRequeue’

    From the docs, it seems that is because MAX_BATCH_REQUEUE is too
    low, but I don’t see where to change that.

    Even worse, I cannot seem to scancel any of those jobs just to
    clean things up and test stuff.

    Anyone know how to get rid of jobs with a status of
    ‘JobHoldMaxRequeue’?

    Brian Andrus

    *From:*John Desantis [mailto:desan...@mail.usf.edu
    <mailto:desan...@mail.usf.edu>]
    *Sent:* Tuesday, January 26, 2016 12:37 PM
    *To:* slurm-dev <slurm-dev@schedmd.com <mailto:slurm-dev@schedmd.com>>
    *Subject:* [slurm-dev] Re: Update job and partition for shared jobs

    Brian,

Try setting a default memory per CPU in the partition definition. Later versions of SLURM (>= 14.11.6?) require this value to be
    set, otherwise all memory per node is scheduled.

    HTH,

    John DeSantis

    2016-01-26 15:20 GMT-05:00 Andrus, Brian Contractor
    <bdand...@nps.edu <mailto:bdand...@nps.edu>>:

        All,

        I am in the process of transitioning from Torque to Slurm.

        So far it is doing very well, especially handling arrays.

        Now I have one array job that is running across several nodes,
        but only using some of the node resources. I would like to
        have slurm start sharing the nodes so some of the array jobs
        will start where there are unused resources.

        I ran a scontrol update to force sharing and see the partition
        did change:

        *#scontrol show partitions*

        *PartitionName=debug*

        *AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL*

        *AllocNodes=ALL Default=YES QoS=N/A*

        *DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO
        GraceTime=0 Hidden=NO*

        *MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO
        MaxCPUsPerNode=UNLIMITED*

        *Nodes=compute[45-49]*

        *Priority=1 RootOnly=NO ReqResv=NO Shared=FORCE:4 PreemptMode=OFF*

        *State=UP TotalCPUs=280 TotalNodes=5 SelectTypeParameters=N/A*

        *DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED*

        But it is not starting job 416_37 on any node as I would expect.

        **

        *#squeue*

        *JOBID PARTITION     NAME     USER ST       TIME  NODES
        NODELIST(REASON)*

        *416_[37-1013%6]     debug slurm_ar user1 PD       0:00      1
        (Resources)*

        *416_36     debug slurm_ar  user1 R      35:46      1 compute49*

        *416_35     debug slurm_ar  user1 R    1:47:25      1 compute46*

        *416_33     debug slurm_ar  user1 R    7:30:50      1 compute45*

        *416_32     debug slurm_ar  user1 R    7:38:39      1 compute47*

        *416_31     debug slurm_ar  user1 R    8:53:26      1 compute48*

        In my config, I have:

        *SelectType      = select/cons_res*

        *SelectTypeParameters = CR_CORE_MEMORY*

        What am I missing to get more than one job to run on a node?

        Thanks in advance,

        Brian Andrus


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

Reply via email to