[slurm-dev] Re: How to define multiple drives as Gres?

Aaron Knister Tue, 23 Jun 2015 05:18:12 -0700

Hi Jan,

Apologies for the delay. It looks like SLURM
versions less than 15.08 only support 32-bit gres values as you noticed (I took 
a peek at the code). Perhaps a work around would be to do away with the 
suffixes and append "_gb" to the GRES name (e.g disk_gb). Once you have support 
for 64-but counters you could always change this later and use a submission 
filter to provide backwards compatibility.


Hope that helps!

-Aaron

Sent from my iPhone

> On Jun 23, 2015, at 5:08 AM, Jan Schulze <[email protected]> wrote:
> 
> 
> Dear all,
> I still have the here mentioned problem. Did someone of you experience 
> similar problems with disk related gres? Is there a trivial point which I 
> missed so far?
> 
> Thanks in advance.
> 
> greetings
> 
> Jan
> 
> 
> 
> 
>> On Jun 15, 2015, at 10:06 AM,  wrote:
>> 
>> Hi Aaron,
>> thanks for the quick response. You are right, I'd like to provide some 
>> scratch space by means of a filesystem. So I guess your 'recipe' should 
>> perfectly work. I'm currently playing around with a test configuration and 
>> adjusted the gres.conf accordingly:
>> 
>> 
>>        cat gres.conf
>>        Name=disk Type=fast Count=48G
>>        Name=disk Type=data Count=147G
>> 
>>        cat nodenames.conf
>>        NodeName=compute-0-0 Gres=disk:fast:48G,disk:data:147G 
>> NodeAddr=192.168.255.253 CPUs=4 Weight=20484100 Feature=rack-0,4CPUs
>> 
>> 
>> Unfortunately I stuck already when trying to restart the slurmd, it doesnt 
>> come up and complains in the log file:
>> 
>>        fatal: Gres disk has invalid count value 51539607552
>> 
>> (slurmctld comes up without any troubles)
>> 
>> As both, slurmd and slurmctld, are properly come up when I change the Count 
>> field to Count=1G (up to 3G), I figured that it is a problem of the 32-bit 
>> nature of the count field. However, I thought that this issue would be 
>> circumvented by the suffix K,M and G. 
>> 
>> 
>> 
>> What am I missing?
>> 
>> 
>> Thanks.
>> 
>> greetings
>> 
>> 
>> Jan
>> 
>> 
>> 
>> 
>>> On Jun 12, 2015, at 2:44 PM, Aaron Knister wrote:
>>> 
>>> 
>>> Hi Jan,
>>> 
>>> Are you looking to make raw block devices assessable to jobs or a file 
>>> system?
>>> 
>>> The term "running on"  can mean different things-- it could be where the 
>>> application binary lives, or where input and or output files live, or maybe 
>>> some other things too. I'll figure you're looking to provide scratch space 
>>> on the node by means of a filesystem. 
>>> 
>>> If you'd like to hand out filesystem access let's say each disk is mounted 
>>> at /local_disk/sata and /local_disk/sas, respectively, you could define the 
>>> GRES as:
>>> 
>>> Name=local_disk Type=sata Count=3800G
>>> Name=local_disk Type=sas Count=580G
>>> 
>>> (You'll probably want to adjust the value of Count depending on what size 
>>> the drives format out to). 
>>> 
>>> You could then write some prolog magic to actually allocate that space on 
>>> the nodes (if you're sharing nodes between jobs) via quotas (or maybe 
>>> something more fancy if you have say ZFS or btrfs) and creates a 
>>> job-specific directory under the mount point.  In addition you could set an 
>>> environment variable via the prolog that points to the path for the storage 
>>> so users can reference it in their jobs regardless of disk type. A single 
>>> SLURM_LOCAL_DISK variable might do the job. The last piece is an epilog job 
>>> to delete the job-specific directory and unset any quotas along with a cron 
>>> job to periodically check that the directories and quotas have been cleaned 
>>> up on each node in case there's an issue with the SLURM epilog (e.g. A 
>>> nodes reboots during the job)
>>> 
>>> I hope that helps and isn't overwhelming. If you have questions about any 
>>> of the parts I'm happy to explain more. 
>>> 
>>> Best,
>>> Aaron
>>> 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jun 12, 2015, at 8:18 AM, Jan Schulze <[email protected]> 
>>>> wrote:
>>>> 
>>>> 
>>>> Dear all,
>>>> 
>>>> this is slurm 14.11.6 on a ROCKS 6.2 cluster. 
>>>> 
>>>> We'are currently planing to build a cluster out of computing nodes each 
>>>> having one SAS(600GB) and one SATA(4TB) hard drive. Is there a way that 
>>>> one can configure the nodes such that the user can specify on which kind 
>>>> of disk the job is supposed to run? So in the gres.conf file something 
>>>> like 
>>>> 
>>>> Name=storage Type=SATA File=/dev/sda1 Count=4000G
>>>> Name=fast Type=SAS File=/dev/sdb1 Count=600G
>>>> 
>>>> ?
>>>> 
>>>> 
>>>> Thanks in advance.
>>>> 
>>>> 
>>>> greetings
>>>> 
>>>> Jan Schulze=
>>

[slurm-dev] Re: How to define multiple drives as Gres?

Reply via email to