[slurm-dev] Re: CR_Core_Memory and CR_Core versus CR_CPU_Memory

2013-08-09 Thread Eva Hocks
Martin, it's an allocation issue. With CR_CPU I can run 32 different tasks on a 16 core system with hyperthreading. With CR_Core I can run 16 tasks since each task allocates 2 CPUs instead of 1. This particular application does not care if it shares a core but it does care about how many sin

[slurm-dev] Re: CR_Core_Memory and CR_Core versus CR_CPU_Memory

2013-08-09 Thread Martin . Perry
Eva, Does it matter? CR_Core and CR_Core_Memory prevent multiple jobs from being allocated CPUs on the same core. So whether Slurm allocates one or both CPUs on a core to your job shouldn't make any difference for other jobs. Or is this just an accounting issue? Martin From: Eva Hocks To

[slurm-dev] CR_Core_Memory and CR_Core versus CR_CPU_Memory

2013-08-09 Thread Eva Hocks
I am struggling to set slurm to allocate the correct CPUs for a job requesting 1 task. With hyperthreading enabled and the CR_Core setting slurm allocates 2 CPUs per job requesting 1 CPU: srun -n 1 --pty --ntasks-per-node=1 shows: NodeList=gpu-1-9 NumNodes=2 NumCPUs=2 CPUs/Task=1 The n

[slurm-dev] Re: Job count exceeds limit

2013-08-09 Thread Eckert, Phil
I believe you have exceeded the MaxJobCount specified in your slurm.conf, or have reached the default of 1 jobs. MaxJobCount The maximum number of jobs SLURM can have in its active database at one time. Set the values of MaxJobCount and MinJobAge to insure t

[slurm-dev] Job count exceeds limit

2013-08-09 Thread Mario Kadastik
Hi, lately we've started to see this: [2013-08-09T18:57:12+03:00] error: create_job_record: job_count exceeds limit [2013-08-09T18:57:13+03:00] error: create_job_record: job_count exceeds limit [2013-08-09T18:57:16+03:00] error: create_job_record: job_count exceeds limit and I can't quite under

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
I misspoke. The JobAcctGatherType=jobacct_gather/cgroup plugin is experimental and not ready for use. Your configuration should work. Quoting Moe Jette : Your explanation seems likely. You probably want to change your configuration to: JobAcctGatherType=jobacct_gather/cgroup Quoting Andy

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
Your explanation seems likely. You probably want to change your configuration to: JobAcctGatherType=jobacct_gather/cgroup Quoting Andy Wettstein : I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID gets set to 1. The _get

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Andy Wettstein
I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID gets set to 1. The _get_offspring_data function in jobacct_gather/linux ignores these when adding up memory usage. It seems like if proctrack/cgroup is enabled, the jobacct_gat

[slurm-dev] Re: Jobs not queued in SLURM 2.3

2013-08-09 Thread José Manuel Molero
Hi, Finally the problem is solved. Reeboting all the free worker nodes. Thats all. Bye. Date: Wed, 7 Aug 2013 11:44:46 -0700 From: jml...@hotmail.com To: slurm-dev@schedmd.com Subject: [slurm-dev] Re: Jobs not queued in SLURM 2.3 Hi Carles, Thanks for your reply. I dont see any error in the lo