[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-14 Thread Andy Wettstein
On Tue, Aug 13, 2013 at 06:07:53PM -0700, Christopher Samuel wrote: > > On 14/08/13 02:59, Andy Wettstein wrote: > > > If proctrack/cgroup is not being used, I don't think it is possible > > to properly track a process that does this. Since proctrack/cgroup > > can reliably track all PIDs for a

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/08/13 02:59, Andy Wettstein wrote: > If proctrack/cgroup is not being used, I don't think it is possible > to properly track a process that does this. Since proctrack/cgroup > can reliably track all PIDs for a step, then I think it should be >

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-13 Thread Andy Wettstein
Here is my config and an example perl script that will go over the memory limit without being identified. It just creates an array and fills it up, so the array size may need to be adjusted depending on the memory configuration. If the call to daemonize() is commented, the job should get killed due

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Danny Auble
Experimental meaning it doesn't work as correctly as the linux plugin does. I know when we last worked on it the cgroup plugin did not do memory accounting correctly. I also know there is quite a bit of functionality missing as well, (profiling and such). Basically it is half baked at this

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Ryan Cox
Moe, In what way is it experimental? Is it possibly unstable or just not feature-complete? We're writing a script to independently gather statistics for our own database and would like to use the cpuacct cgroup, thus the interest in the jobacct_gather/cgroup plugin. Ryan On 08/09/2013 1

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-12 Thread Andy Wettstein
Ok. When you say it should work do you mean there is a bug in slurm that is causing this problem? I can send a fairly trivial example that can bypass any memory limits if you need it. On Fri, Aug 09, 2013 at 09:07:53AM -0700, Moe Jette wrote: > > I misspoke. The JobAcctGatherType=jobacct_gather

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
I misspoke. The JobAcctGatherType=jobacct_gather/cgroup plugin is experimental and not ready for use. Your configuration should work. Quoting Moe Jette : Your explanation seems likely. You probably want to change your configuration to: JobAcctGatherType=jobacct_gather/cgroup Quoting Andy

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Moe Jette
Your explanation seems likely. You probably want to change your configuration to: JobAcctGatherType=jobacct_gather/cgroup Quoting Andy Wettstein : I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID gets set to 1. The _get

[slurm-dev] Re: job steps not properly identified for jobs using step_batch cgroups

2013-08-09 Thread Andy Wettstein
I understand this problem more fully now. Certains jobs that our users run fork processes in a way that the parent PID gets set to 1. The _get_offspring_data function in jobacct_gather/linux ignores these when adding up memory usage. It seems like if proctrack/cgroup is enabled, the jobacct_gat