Martin,
it's an allocation issue. With CR_CPU I can run 32 different tasks on a
16 core system with hyperthreading. With CR_Core I can run 16 tasks
since each task allocates 2 CPUs instead of 1. This particular
application does not care if it shares a core but it does care about how
many sin
Eva,
Does it matter? CR_Core and CR_Core_Memory prevent multiple jobs from
being allocated CPUs on the same core. So whether Slurm allocates one or
both CPUs on a core to your job shouldn't make any difference for other
jobs. Or is this just an accounting issue?
Martin
From: Eva Hocks
To
I am struggling to set slurm to allocate the correct CPUs for a job
requesting 1 task.
With hyperthreading enabled and the CR_Core setting slurm allocates 2
CPUs per job requesting 1 CPU:
srun -n 1 --pty --ntasks-per-node=1
shows:
NodeList=gpu-1-9
NumNodes=2 NumCPUs=2 CPUs/Task=1
The n
I believe you have exceeded the MaxJobCount specified in your slurm.conf,
or have reached the default of 1 jobs.
MaxJobCount
The maximum number of jobs SLURM can have in its active
database at one time. Set the values of MaxJobCount and MinJobAge
to insure t
Hi,
lately we've started to see this:
[2013-08-09T18:57:12+03:00] error: create_job_record: job_count exceeds limit
[2013-08-09T18:57:13+03:00] error: create_job_record: job_count exceeds limit
[2013-08-09T18:57:16+03:00] error: create_job_record: job_count exceeds limit
and I can't quite under
I misspoke. The JobAcctGatherType=jobacct_gather/cgroup plugin is
experimental and not ready for use. Your configuration should work.
Quoting Moe Jette :
Your explanation seems likely. You probably want to change your
configuration to:
JobAcctGatherType=jobacct_gather/cgroup
Quoting Andy
Your explanation seems likely. You probably want to change your
configuration to:
JobAcctGatherType=jobacct_gather/cgroup
Quoting Andy Wettstein :
I understand this problem more fully now.
Certains jobs that our users run fork processes in a way that the parent
PID gets set to 1. The _get
I understand this problem more fully now.
Certains jobs that our users run fork processes in a way that the parent
PID gets set to 1. The _get_offspring_data function in
jobacct_gather/linux ignores these when adding up memory usage.
It seems like if proctrack/cgroup is enabled, the jobacct_gat
Hi,
Finally the problem is solved. Reeboting all the free worker nodes. Thats all.
Bye.
Date: Wed, 7 Aug 2013 11:44:46 -0700
From: jml...@hotmail.com
To: slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Jobs not queued in SLURM 2.3
Hi Carles,
Thanks for your reply.
I dont see any error in the lo