Oh, I've solve the problem.
While I'm searching on Google about "cgroup monitoring", there was a
program called ctop.
And I run it inside my slurm-docker images and it said:

# ctop
[WARN] Failed to find any relevant cgroup/container.ree view │ [Q]uit

Hint: It seems you are running inside a Docker container.
      Please make sure to expose host's cgroups with
      '--volume=/sys/fs/cgroup:/sys/fs/cgroup:ro'

So I just mounted that volume and it was successfully solved.

Thank you, anyway!

Sincerly,

Sumin.


Sumin Han
Undergraduate '13, School of Computing
Korea Advanced Institute of Science and Technology
Daehak-ro 291
Yuseong-gu, Daejeon
Republic of Korea 305-701
Tel. +82-10-2075-6911

2017-08-02 16:56 GMT+09:00 한수민 <hsm6...@gmail.com>:

> Well.. now i've change to
>
> cgroup.conf:
> ###
> # Slurm cgroup support configuration file
> ###
> CgroupAutomount=no
> CgroupMountpoint=/sys/fs/cgroup
> #CgroupReleaseAgentDir="/etc/slurm/cgroup"
> ConstrainCores=yes
>
> #TaskAffinity=no
> #
>
> but still don't work.
>
> In fact, I'm using slurm inside docker container. Would it can cause
> problem with using slurm with cgroup?
>
> Sumin Han
> Undergraduate '13, School of Computing
> Korea Advanced Institute of Science and Technology
> Daehak-ro 291
> Yuseong-gu, Daejeon
> Republic of Korea 305-701
> Tel. +82-10-2075-6911
>
> 2017-08-02 13:34 GMT+09:00 Lachlan Musicman <data...@gmail.com>:
>
>> You will see here
>>
>> https://groups.google.com/forum/#!msg/slurm-devel/lKX8st9azt
>> I/dF5Kvz4gDAAJ
>>
>> that you need to set
>>
>> CgroupAutomount=no
>>
>> in cgroup.conf
>>
>> if you are running a system using systemd
>>
>> cheers
>> L.
>>
>> ------
>> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
>> civics is the insistence that we cannot ignore the truth, nor should we
>> panic about it. It is a shared consciousness that our institutions have
>> failed and our ecosystem is collapsing, yet we are still here — and we are
>> creative agents who can shape our destinies. Apocalyptic civics is the
>> conviction that the only way out is through, and the only way through is
>> together. "
>>
>> *Greg Bloom* @greggish https://twitter.com/greggish/s
>> tatus/873177525903609857
>>
>> On 2 August 2017 at 14:27, 한수민 <hsm6...@gmail.com> wrote:
>>
>>> My slurmd.log says:
>>>
>>> [2017-08-02T04:25:45.453] debug2: _file_read_content: unable to open
>>> '/sys/fs/cgroup/freezer//release_agent' for reading : No such file or
>>> directory
>>> [2017-08-02T04:25:45.453] debug2: xcgroup_get_param: unable to get
>>> parameter 'release_agent' for '/sys/fs/cgroup/freezer/'
>>> [2017-08-02T04:25:45.453] error: unable to mount freezer cgroup
>>> namespace: Device or resource busy
>>> [2017-08-02T04:25:45.453] error: unable to create freezer cgroup
>>> namespace
>>> [2017-08-02T04:25:45.453] error: Couldn't load specified plugin name for
>>> proctrack/cgroup: Plugin init() callback failed
>>> [2017-08-02T04:25:45.453] error: cannot create proctrack context for
>>> proctrack/cgroup
>>> [2017-08-02T04:25:45.453] error: slurmd initialization failed
>>>
>>>
>>> hmm...
>>>
>>> Sumin Han
>>> Undergraduate '13, School of Computing
>>> Korea Advanced Institute of Science and Technology
>>> Daehak-ro 291
>>> Yuseong-gu, Daejeon
>>> Republic of Korea 305-701
>>> Tel. +82-10-2075-6911 <+82%2010-2075-6911>
>>>
>>> 2017-08-02 13:05 GMT+09:00 Lachlan Musicman <data...@gmail.com>:
>>>
>>>> [root@n6 /]# si
>>>>>
>>>>> PARTITION            NODES NODES(A/I/O/T) S:C:T    MEMORY
>>>>> TMP_DISK   TIMELIMIT   AVAIL_FEATURES   NODELIST
>>>>>
>>>>> debug*               6     0/6/0/6        1:4:2    7785
>>>>> 113264     infinite    (null)           c[1-6]
>>>>>
>>>>> (for a moment)
>>>>>
>>>>> [root@n6 /]# si
>>>>>
>>>>> PARTITION            NODES NODES(A/I/O/T) S:C:T    MEMORY
>>>>> TMP_DISK   TIMELIMIT   AVAIL_FEATURES   NODELIST
>>>>>
>>>>> debug*               6     0/0/6/6        1:4:2    7785
>>>>> 113264     infinite    (null)           c[1-6]
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> 0/0/6/6 means your nodes are dying.
>>>>
>>>> You need to look into the /var/log/slurm/slurmd.log (*or where ever you
>>>> put the slurmd logs on the machine, as dictated by
>>>> SlurmdLogFile= ) on each of the nodes.
>>>>
>>>> I would predict that there is something wrong with your cgroup.conf
>>>>
>>>> try:
>>>>
>>>>  - confirming that /etc/slurm/cgroup directory exists on all nodes (as
>>>> per your cgroup.conf)
>>>>  - commenting out everything in cgroup.conf except CgroupAutomount=yes
>>>> ConstrainCores=yes
>>>>
>>>> Cheers
>>>> L.
>>>>
>>>>
>>>> ------
>>>> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
>>>> civics is the insistence that we cannot ignore the truth, nor should we
>>>> panic about it. It is a shared consciousness that our institutions have
>>>> failed and our ecosystem is collapsing, yet we are still here — and we are
>>>> creative agents who can shape our destinies. Apocalyptic civics is the
>>>> conviction that the only way out is through, and the only way through is
>>>> together. "
>>>>
>>>> *Greg Bloom* @greggish https://twitter.com/greggish/s
>>>> tatus/873177525903609857
>>>>
>>>>
>>>
>>
>

Reply via email to