[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Andy Wettstein
Ok. When you say it should work do you mean there is a bug in slurm that is causing this problem? I can send a fairly trivial example that can bypass any memory limits if you need it. On Fri, Aug 09, 2013 at 09:07:53AM -0700, Moe Jette wrote: I misspoke. The

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Ryan Cox
Moe, In what way is it experimental? Is it possibly unstable or just not feature-complete? We're writing a script to independently gather statistics for our own database and would like to use the cpuacct cgroup, thus the interest in the jobacct_gather/cgroup plugin. Ryan On 08/09/2013

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Danny Auble
Experimental meaning it doesn't work as correctly as the linux plugin does. I know when we last worked on it the cgroup plugin did not do memory accounting correctly. I also know there is quite a bit of functionality missing as well, (profiling and such). Basically it is half baked at this

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Andy Wettstein
I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID gets set to 1. The _get_offspring_data function in jobacct_gather/linux ignores these when adding up memory usage. It seems like if proctrack/cgroup is enabled, the

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
Your explanation seems likely. You probably want to change your configuration to: JobAcctGatherType=jobacct_gather/cgroup Quoting Andy Wettstein wettst...@uchicago.edu: I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
I misspoke. The JobAcctGatherType=jobacct_gather/cgroup plugin is experimental and not ready for use. Your configuration should work. Quoting Moe Jette je...@schedmd.com: Your explanation seems likely. You probably want to change your configuration to: