[slurm-dev] Re: Gang Scheduling

Moe Jette Fri, 29 Jun 2012 08:18:07 -0700

You appear to have allocated all of the nodes memory to the first job.  
All jobs need to fit into memory then the jobs can be time sliced. I  
will try to clarify in the documentation.


Quoting Sefa Arslan <sefa.ars...@tubitak.gov.tr>:

>
> Yes I have:
>
> NodeName=mercan[5-192] Procs=24 Sockets=2 CoresPerSocket=12
> RealMemory=128000 State=UNKNOWN
> PartitionName=test Nodes=mercan[7-192] Default=YES MaxTime=INFINITE
> State=UP DefMemPerCPU=5300 Shared=FORCE:4
>
>
> scontrol show config:
> Configuration data as of 2012-06-29T09:57:56
> AccountingStorageBackupHost = (null)
> AccountingStorageEnforce = associations,limits
> AccountingStorageHost   = mercan5
> AccountingStorageLoc    = N/A
> AccountingStoragePort   = 6819
> AccountingStorageType   = accounting_storage/slurmdbd
> AccountingStorageUser   = N/A
> AccountingStoreJobComment = YES
> AuthType                = auth/munge
> BackupAddr              = (null)
> BackupController        = (null)
> BatchStartTimeout       = 10 sec
> BOOT_TIME               = 2012-06-29T09:12:09
> CacheGroups             = 1
> CheckpointType          = checkpoint/none
> ClusterName             = linux
> CompleteWait            = 0 sec
> ControlAddr             = mercan5
> ControlMachine          = mercan5
> CryptoType              = crypto/munge
> DebugFlags              = (null)
> DefMemPerCPU            = 5300
> DisableRootJobs         = NO
> EnforcePartLimits       = YES
> Epilog                  = (null)
> EpilogMsgTime           = 2000 usec
> EpilogSlurmctld         = (null)
> FastSchedule            = 1
> FirstJobId              = 100000
> GetEnvTimeout           = 2 sec
> GresTypes               = (null)
> GroupUpdateForce        = 0
> GroupUpdateTime         = 600 sec
> HASH_VAL                = Match
> HealthCheckInterval     = 0 sec
> HealthCheckProgram      = (null)
> InactiveLimit           = 0 sec
> JobAcctGatherFrequency  = 30 sec
> JobAcctGatherType       = jobacct_gather/linux
> JobCheckpointDir        = /var/slurm/checkpoint
> JobCompHost             = localhost
> JobCompLoc              = /var/log/slurm/job_completions
> JobCompPort             = 0
> JobCompType             = jobcomp/filetxt
> JobCompUser             = root
> JobCredentialPrivateKey = (null)
> JobCredentialPublicCertificate = (null)
> JobFileAppend           = 0
> JobRequeue              = 1
> JobSubmitPlugins        = (null)
> KillOnBadExit           = 0
> KillWait                = 30 sec
> Licenses                = (null)
> MailProg                = /bin/mail
> MaxJobCount             = 1000000
> MaxJobId                = 4294901760
> MaxMemPerNode           = UNLIMITED
> MaxStepCount            = 40000
> MaxTasksPerNode         = 128
> MessageTimeout          = 10 sec
> MinJobAge               = 300 sec
> MpiDefault              = none
> MpiParams               = (null)
> NEXT_JOB_ID             = 106036
> OverTimeLimit           = 0 min
> PluginDir               = /usr/lib64/slurm
> PlugStackConfig         = /etc/slurm/plugstack.conf
> PreemptMode             = GANG,SUSPEND
> PreemptType             = preempt/partition_prio
> PriorityDecayHalfLife   = 00:00:00
> PriorityCalcPeriod      = 00:05:00
> PriorityFavorSmall      = 0
> PriorityMaxAge          = 14-00:00:00
> PriorityUsageResetPeriod = NONE
> PriorityType            = priority/multifactor
> PriorityWeightAge       = 1000
> PriorityWeightFairShare = 10000
> PriorityWeightJobSize   = 1000
> PriorityWeightPartition = 1000
> PriorityWeightQOS       = 0
> PrivateData             = none
> ProctrackType           = proctrack/cgroup
> Prolog                  = (null)
> PrologSlurmctld         = (null)
> PropagatePrioProcess    = 0
> PropagateResourceLimits = (null)
> PropagateResourceLimitsExcept = MEMLOCK
> RebootProgram           = (null)
> ReconfigFlags           = (null)
> ResumeProgram           = (null)
> ResumeRate              = 300 nodes/min
> ResumeTimeout           = 60 sec
> ResvOverRun             = 0 min
> ReturnToService         = 0
> SallocDefaultCommand    = (null)
> SchedulerParameters     = (null)
> SchedulerPort           = 7321
> SchedulerRootFilter     = 1
> SchedulerTimeSlice      = 30 sec
> SchedulerType           = sched/builtin
> SelectType              = select/cons_res
> SelectTypeParameters    = CR_CPU_MEMORY
> SlurmUser               = root(0)
> SlurmctldDebug          = debug3
> SlurmctldLogFile        = /var/log/slurm/slurmctld.log
> SlurmSchedLogFile       = (null)
> SlurmctldPort           = 6817
> SlurmctldTimeout        = 300 sec
> SlurmdDebug             = debug3
> SlurmdLogFile           = /var/log/slurm/slurmd.log
> SlurmdPidFile           = /var/run/slurmd.pid
> SlurmdPort              = 6818
> SlurmdSpoolDir          = /tmp/slurmd
> SlurmdTimeout           = 300 sec
> SlurmdUser              = root(0)
> SlurmSchedLogLevel      = 0
> SlurmctldPidFile        = /var/run/slurmctld.pid
> SLURM_CONF              = /etc/slurm/slurm.conf
> SLURM_VERSION           = 2.4.0-pre4
> SrunEpilog              = (null)
> SrunProlog              = (null)
> StateSaveLocation       = /tmp
> SuspendExcNodes         = (null)
> SuspendExcParts         = (null)
> SuspendProgram          = (null)
> SuspendRate             = 60 nodes/min
> SuspendTime             = NONE
> SuspendTimeout          = 30 sec
> SwitchType              = switch/none
> TaskEpilog              = (null)
> TaskPlugin              = task/cgroup
> TaskPluginParam         = (null type)
> TaskProlog              = (null)
> TmpFS                   = /tmp
> TopologyPlugin          = topology/none
> TrackWCKey              = 0
> TreeWidth               = 50
> UsePam                  = 1
> UnkillableStepProgram   = (null)
> UnkillableStepTimeout   = 60 sec
> VSizeFactor             = 0 percent
> WaitTime                = 0 sec
>
>
> On 06/28/2012 05:53 PM, Moe Jette wrote:
>>
>> Do you have "Shared" option configured for the partitions as
>> instructed here:
>> http://www.schedmd.com/slurmdocs/gang_scheduling.html
>>
>> Quoting Sefa Arslan <sefa.ars...@tubitak.gov.tr>:
>>
>>> I have hollpwing configuration for Gang Scheduling:
>>>
>>> SchedulerTimeSlice      = 30 sec
>>> SchedulerType           = sched/builtin
>>> SelectType              = select/cons_res
>>> SelectTypeParameters    = CR_CORE_MEMORY
>>> PreemptMode             = GANG,SUSPEND
>>> PreemptType             = preempt/partition_prio
>>>
>>> I have submitted to jobs to the same node with full number of cores.
>>>  The first job started to run immediately. The other is at pendig state.
>>>
>>>  105986 test    sleep     sefa  PD       0:00      1 (Resources)
>>>  105985 test    sleep     sefa   R       5:05      1 mercan89
>>>
>>> Then I have set the priority of second job to a more higher value (
>>> scontrol update JobID=105986 Priority=100000000 ). I was expecting
>>> the second job to be started.   job 105986 is still at pending state.
>>>
>>> How should I configure the slurm?
>>>
>>> One more question: Could any user (or the admin) migrate his/her own
>>> running job from one node to another? How?
>>>
>>>
>>>
>>>
>>>
>

[slurm-dev] Re: Gang Scheduling

Reply via email to