Ok. When you say it should work do you mean there is a bug in slurm that
is causing this problem?

I can send a fairly trivial example that can bypass any memory limits if
you need it.

On Fri, Aug 09, 2013 at 09:07:53AM -0700, Moe Jette wrote:
> 
> I misspoke. The JobAcctGatherType=jobacct_gather/cgroup plugin is
> experimental and not ready for use. Your configuration should work.
> 
> Quoting Moe Jette <je...@schedmd.com>:
> 
> >Your explanation seems likely. You probably want to change your
> >configuration to:
> >JobAcctGatherType=jobacct_gather/cgroup
> >
> >Quoting Andy Wettstein <wettst...@uchicago.edu>:
> >
> >>
> >>I understand this problem more fully now.
> >>
> >>Certains jobs that our users run fork processes in a way that the parent
> >>PID gets set to 1. The _get_offspring_data function in
> >>jobacct_gather/linux ignores these when adding up memory usage.
> >>
> >>It seems like if proctrack/cgroup is enabled, the jobacct_gather/linux
> >>plugin should rely on the cgroup.procs file to identify the pids instead
> >>of trying to figure things out based on parent PID. Is something like
> >>that reasonable?
> >>
> >>Andy
> >>
> >>On Tue, Jul 30, 2013 at 10:59:56AM -0700, Andy Wettstein wrote:
> >>>
> >>>Hi,
> >>>
> >>>I have the following set:
> >>>
> >>>ProctrackType           = proctrack/cgroup
> >>>TaskPlugin              = task/cgroup
> >>>JobAcctGatherType       = jobacct_gather/linux
> >>>
> >>>This is on slurm 2.5.7.
> >>>
> >>>When I use sstat on all running jobs, there are a large number of jobs
> >>>that say they have no steps running (for example: sstat: error: couldn't
> >>>get steps for job 4783548).
> >>>
> >>>This seems to be the case for all steps that use the step_batch cgroup.
> >>>If the step gets created in something like step_0, everything seems to
> >>>be reported ok. In both instances, the PIDs are actually listed in the
> >>>right cgroup.procs file.
> >>>
> >>>I noticed this because there were several jobs that should have been
> >>>killed due to memory limits, but were not. The jobacct_gather plugin
> >>>doesn't know about the processes in the step_batch cgroup so it doesn't
> >>>count the memory usage.
> >>>
> >>>
> >>>Andy
> >>>
> >>>
> >>>
> >>>
> >>>--
> >>>andy wettstein
> >>>hpc system administrator
> >>>research computing center
> >>>university of chicago
> >>>773.702.1104
> >>
> >>--
> >>andy wettstein
> >>hpc system administrator
> >>research computing center
> >>university of chicago
> >>773.702.1104
> >>
> >
> >
> >
> 
> 

-- 
andy wettstein
hpc system administrator
research computing center
university of chicago
773.702.1104

Reply via email to