SLURM will schedule the resources that you have configured on the node (i.e. one task per CPU). If there are other programs running on that node (e.g. your "top" program and daemons need some CPU to run on) then Linux is going to time-share all of the runnable processes.
Quoting jaiber john <jaiber.j...@gmail.com>: > > Hello list > > I'm a newbie user of SLURM, we've configured a 1 node SLURM setup. We > need to run multiple jobs in parallel that will push the system to the > limits. > What we find is that the jobs get started but they go to Deferred mode > (as seen in Linux top command), we dont want this to happen. It is OK > if CPU and/or memory is used 100%. > > What are the necessary configurations to achieve this? We use the > default backfill scheduler. We need to run about 10 jobs > simultaneously on one node without getting suspended. The slurm config > is as follows: > > jjjohn@localhost:~$ scontrol show config > Configuration data as of 2012-03-22T21:55:59 > AccountingStorageBackupHost = (null) > AccountingStorageEnforce = none > AccountingStorageHost = localhost > AccountingStorageLoc = /tmp/slurm > AccountingStoragePort = 0 > AccountingStorageType = accounting_storage/filetxt > AccountingStorageUser = root > AuthType = auth/none > BackupAddr = (null) > BackupController = (null) > BatchStartTimeout = 10 sec > BOOT_TIME = 2012-03-22T17:52:41 > CacheGroups = 0 > CheckpointType = checkpoint/none > ClusterName = cluster > CompleteWait = 0 sec > ControlAddr = 10.22x.xx.xx > ControlMachine = localhost > CryptoType = crypto/munge > DebugFlags = (null) > DefMemPerCPU = UNLIMITED > DisableRootJobs = NO > EnforcePartLimits = NO > Epilog = (null) > EpilogMsgTime = 2000 usec > EpilogSlurmctld = (null) > FastSchedule = 1 > FirstJobId = 1 > GetEnvTimeout = 2 sec > GresTypes = (null) > GroupUpdateForce = 0 > GroupUpdateTime = 600 sec > HashVal = Match > HealthCheckInterval = 0 sec > HealthCheckProgram = (null) > InactiveLimit = 0 sec > JobAcctGatherFrequency = 30 sec > JobAcctGatherType = jobacct_gather/none > JobCheckpointDir = /var/slurm/checkpoint > JobCompHost = localhost > JobCompLoc = /tmp/slurmCompLog > JobCompPort = 0 > JobCompType = jobcomp/none > JobCompUser = root > JobCredentialPrivateKey = (null) > JobCredentialPublicCertificate = (null) > JobFileAppend = 0 > JobRequeue = 1 > JobSubmitPlugins = (null) > KillOnBadExit = 0 > KillWait = 30 sec > Licenses = (null) > MailProg = /usr/bin/mail > MaxJobCount = 10000 > MaxMemPerCPU = UNLIMITED > MaxTasksPerNode = 128 > MessageTimeout = 10 sec > MinJobAge = 300 sec > MpiDefault = none > MpiParams = (null) > NEXT_JOB_ID = 309 > OverTimeLimit = 0 min > PluginDir = /usr/lib/slurm > PlugStackConfig = /etc/slurm-llnl/plugstack.conf > PreemptMode = OFF > PreemptType = preempt/none > PriorityType = priority/basic > PrivateData = none > ProctrackType = proctrack/pgid > Prolog = (null) > PrologSlurmctld = (null) > PropagatePrioProcess = 0 > PropagateResourceLimits = ALL > PropagateResourceLimitsExcept = (null) > ResumeProgram = (null) > ResumeRate = 300 nodes/min > ResumeTimeout = 60 sec > ResvOverRun = 0 min > ReturnToService = 1 > SallocDefaultCommand = (null) > SchedulerParameters = (null) > SchedulerPort = 7321 > SchedulerRootFilter = 1 > SchedulerTimeSlice = 30 sec > SchedulerType = sched/backfill > SelectType = select/linear > SlurmUser = slurm(64030) > SlurmctldDebug = 3 > SlurmctldLogFile = (null) > SlurmSchedLogFile = (null) > SlurmctldPort = 6817 > SlurmctldTimeout = 120 sec > SlurmdDebug = 3 > SlurmdLogFile = (null) > SlurmdPidFile = /var/run/slurmd.pid > SlurmdPort = 6818 > SlurmdSpoolDir = /tmp/slurmd > SlurmdTimeout = 300 sec > SlurmdUser = root(0) > SlurmSchedLogLevel = 0 > SlurmctldPidFile = /var/run/slurmctld.pid > SLURM_CONF = /etc/slurm-llnl/slurm.conf > SLURM_VERSION = 2.2.7 > SrunEpilog = (null) > SrunProlog = (null) > StateSaveLocation = /tmp > SuspendExcNodes = (null) > SuspendExcParts = (null) > SuspendProgram = (null) > SuspendRate = 60 nodes/min > SuspendTime = NONE > SuspendTimeout = 30 sec > SwitchType = switch/none > TaskEpilog = (null) > TaskPlugin = task/none > TaskPluginParam = (null type) > TaskProlog = (null) > TmpFS = /tmp > TopologyPlugin = topology/none > TrackWCKey = 0 > TreeWidth = 50 > UsePam = 0 > UnkillableStepProgram = (null) > UnkillableStepTimeout = 60 sec > VSizeFactor = 0 percent > WaitTime = 0 sec > > Slurmctld(primary/backup) at localhost/(NULL) are UP/DOWN > > > -- > <Jaiber John>