You will see here https://groups.google.com/forum/#!msg/slurm-devel/lKX8st9aztI/dF5Kvz4gDAAJ
that you need to set CgroupAutomount=no in cgroup.conf if you are running a system using systemd cheers L. ------ "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. " *Greg Bloom* @greggish https://twitter.com/greggish/status/873177525903609857 On 2 August 2017 at 14:27, 한수민 <hsm6...@gmail.com> wrote: > My slurmd.log says: > > [2017-08-02T04:25:45.453] debug2: _file_read_content: unable to open > '/sys/fs/cgroup/freezer//release_agent' for reading : No such file or > directory > [2017-08-02T04:25:45.453] debug2: xcgroup_get_param: unable to get > parameter 'release_agent' for '/sys/fs/cgroup/freezer/' > [2017-08-02T04:25:45.453] error: unable to mount freezer cgroup namespace: > Device or resource busy > [2017-08-02T04:25:45.453] error: unable to create freezer cgroup namespace > [2017-08-02T04:25:45.453] error: Couldn't load specified plugin name for > proctrack/cgroup: Plugin init() callback failed > [2017-08-02T04:25:45.453] error: cannot create proctrack context for > proctrack/cgroup > [2017-08-02T04:25:45.453] error: slurmd initialization failed > > > hmm... > > Sumin Han > Undergraduate '13, School of Computing > Korea Advanced Institute of Science and Technology > Daehak-ro 291 > Yuseong-gu, Daejeon > Republic of Korea 305-701 > Tel. +82-10-2075-6911 <+82%2010-2075-6911> > > 2017-08-02 13:05 GMT+09:00 Lachlan Musicman <data...@gmail.com>: > >> [root@n6 /]# si >>> >>> PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY TMP_DISK >>> TIMELIMIT AVAIL_FEATURES NODELIST >>> >>> debug* 6 0/6/0/6 1:4:2 7785 113264 >>> infinite (null) c[1-6] >>> >>> (for a moment) >>> >>> [root@n6 /]# si >>> >>> PARTITION NODES NODES(A/I/O/T) S:C:T MEMORY TMP_DISK >>> TIMELIMIT AVAIL_FEATURES NODELIST >>> >>> debug* 6 0/0/6/6 1:4:2 7785 113264 >>> infinite (null) c[1-6] >>> >>> >> >> >> >> 0/0/6/6 means your nodes are dying. >> >> You need to look into the /var/log/slurm/slurmd.log (*or where ever you >> put the slurmd logs on the machine, as dictated by >> SlurmdLogFile= ) on each of the nodes. >> >> I would predict that there is something wrong with your cgroup.conf >> >> try: >> >> - confirming that /etc/slurm/cgroup directory exists on all nodes (as >> per your cgroup.conf) >> - commenting out everything in cgroup.conf except CgroupAutomount=yes >> ConstrainCores=yes >> >> Cheers >> L. >> >> >> ------ >> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic >> civics is the insistence that we cannot ignore the truth, nor should we >> panic about it. It is a shared consciousness that our institutions have >> failed and our ecosystem is collapsing, yet we are still here — and we are >> creative agents who can shape our destinies. Apocalyptic civics is the >> conviction that the only way out is through, and the only way through is >> together. " >> >> *Greg Bloom* @greggish https://twitter.com/greggish/s >> tatus/873177525903609857 >> >> >