Dear all, I still have the here mentioned problem. Did someone of you experience similar problems with disk related gres? Is there a trivial point which I missed so far?
Thanks in advance. greetings Jan On Jun 15, 2015, at 10:06 AM, wrote: > Hi Aaron, > thanks for the quick response. You are right, I'd like to provide some > scratch space by means of a filesystem. So I guess your 'recipe' should > perfectly work. I'm currently playing around with a test configuration and > adjusted the gres.conf accordingly: > > > cat gres.conf > Name=disk Type=fast Count=48G > Name=disk Type=data Count=147G > > cat nodenames.conf > NodeName=compute-0-0 Gres=disk:fast:48G,disk:data:147G > NodeAddr=192.168.255.253 CPUs=4 Weight=20484100 Feature=rack-0,4CPUs > > > Unfortunately I stuck already when trying to restart the slurmd, it doesnt > come up and complains in the log file: > > fatal: Gres disk has invalid count value 51539607552 > > (slurmctld comes up without any troubles) > > As both, slurmd and slurmctld, are properly come up when I change the Count > field to Count=1G (up to 3G), I figured that it is a problem of the 32-bit > nature of the count field. However, I thought that this issue would be > circumvented by the suffix K,M and G. > > > > What am I missing? > > > Thanks. > > greetings > > > Jan > > > > > On Jun 12, 2015, at 2:44 PM, Aaron Knister wrote: > >> >> Hi Jan, >> >> Are you looking to make raw block devices assessable to jobs or a file >> system? >> >> The term "running on" can mean different things-- it could be where the >> application binary lives, or where input and or output files live, or maybe >> some other things too. I'll figure you're looking to provide scratch space >> on the node by means of a filesystem. >> >> If you'd like to hand out filesystem access let's say each disk is mounted >> at /local_disk/sata and /local_disk/sas, respectively, you could define the >> GRES as: >> >> Name=local_disk Type=sata Count=3800G >> Name=local_disk Type=sas Count=580G >> >> (You'll probably want to adjust the value of Count depending on what size >> the drives format out to). >> >> You could then write some prolog magic to actually allocate that space on >> the nodes (if you're sharing nodes between jobs) via quotas (or maybe >> something more fancy if you have say ZFS or btrfs) and creates a >> job-specific directory under the mount point. In addition you could set an >> environment variable via the prolog that points to the path for the storage >> so users can reference it in their jobs regardless of disk type. A single >> SLURM_LOCAL_DISK variable might do the job. The last piece is an epilog job >> to delete the job-specific directory and unset any quotas along with a cron >> job to periodically check that the directories and quotas have been cleaned >> up on each node in case there's an issue with the SLURM epilog (e.g. A nodes >> reboots during the job) >> >> I hope that helps and isn't overwhelming. If you have questions about any of >> the parts I'm happy to explain more. >> >> Best, >> Aaron >> >> >> Sent from my iPhone >> >>> On Jun 12, 2015, at 8:18 AM, Jan Schulze <[email protected]> wrote: >>> >>> >>> Dear all, >>> >>> this is slurm 14.11.6 on a ROCKS 6.2 cluster. >>> >>> We'are currently planing to build a cluster out of computing nodes each >>> having one SAS(600GB) and one SATA(4TB) hard drive. Is there a way that one >>> can configure the nodes such that the user can specify on which kind of >>> disk the job is supposed to run? So in the gres.conf file something like >>> >>> Name=storage Type=SATA File=/dev/sda1 Count=4000G >>> Name=fast Type=SAS File=/dev/sdb1 Count=600G >>> >>> ? >>> >>> >>> Thanks in advance. >>> >>> >>> greetings >>> >>> Jan Schulze= >
