Hi Will,
On 23/2/2019 1:50 AM, Will Dennis wrote:
For one of my groups, on the GPU servers in their cluster, I have provided a RAID-0 md array of multi-TB SSDs
(for I/O speed) mounted on a given path ("/mnt/local" for historical reasons) that they can use for
local scratch space. Their
On 2/22/19 3:54 PM, Aaron Jackson wrote:
Happy to answer any questions about our setup.
If folks are interested in a mailing list where this discussion would be
decidedly on-topic then I'm happy to add people to the Beowulf list
where there's a lot of other folks with expertise in this
Hi Will,
I look after our GPU cluster in our vision lab. We have a similar setup
- we are working from a single ZFS file server. We have two pools:
/db which is about 40TB spinning SAS built out of two raidz vdevs, with
16TB of L2ARC (across 4 SSDs). This reduces the size of ARC quite
We stuck avere between Isilon and a cluster to get us over the hump until next
budget cycle ... then we replaced with spectrascale for mid level storage.
Still use lustre of course as scratch.
On 2/22/19, 12:24 PM, "slurm-users on behalf of Will Dennis"
wrote:
(replies inline)
Just to close loop on this.
This was not as Slurm issue it was more of AD configuration.
AD needs to be installed on all nodes of cluster that way SLURM knows the USER
ID. I had trouble with sssd DB folders missing and sssd.conf file having
appropriate permissions. So look put for those. You
Yes, we've thought about using FS-Cache, but it doesn't help on the first
read-in, and the cache eviction may affect subsequent read attempts...
(different people are using different data sets, and the cache will probably
not hold all of them at the same time...)
On Friday, February 22, 2019
applications) and are presently consuming the data via NFS mounts (both
groups have 10G ethernet interconnects between the Slurm nodes and the NFS
servers.) They are both now complaining of "too-long loading times" for the
how about just using cachefs (backed by a local filesystem on ssd)?
(replies inline)
On Friday, February 22, 2019 1:03 PM, Alex Chekholko said:
>Hi Will,
>
>If your bottleneck is now your network, you may want to upgrade the network.
>Then the disks will become your bottleneck :)
>
Via network bandwidth analysis, it's not really network that's the problem...
Hi Will,
You have bumped into the old adage: "HPC is just about moving the
bottlenecks around".
If your bottleneck is now your network, you may want to upgrade the
network. Then the disks will become your bottleneck :)
For GPU training-type jobs that load the same set of data over and over
Thanks for the reply, Ray.
For one of my groups, on the GPU servers in their cluster, I have provided a
RAID-0 md array of multi-TB SSDs (for I/O speed) mounted on a given path
("/mnt/local" for historical reasons) that they can use for local scratch
space. Their other servers in the cluster
Hi folks,
Not directly Slurm-related, but... We have a couple of research groups that
have large data sets they are processing via Slurm jobs (deep-learning
applications) and are presently consuming the data via NFS mounts (both groups
have 10G ethernet interconnects between the Slurm nodes
Yes! I always have E_WAYTOOMANY tabs open on my Chrome browser, and using
"TooManyTabs" plugin and searching for "Slurm" I see a whole bunch of "Slurm
Workload Manager" entries, then have to guess which one is what page...
-Original Message-
From: slurm-users
On 2/22/19 9:53 AM, Patrice Peterson wrote:
Hello,
it's a little inconvenient that the title tag of all SLURM doc pages only says
"Slurm Workload Manager". I usually have tabs to many SLURM doc pages
open and it's difficult to differentiate between them all.
Would it be possible to change the
On 2/22/19 12:54 AM, Chris Samuel wrote:
On Thursday, 21 February 2019 8:20:36 AM PST נדב טולדו wrote:
Yeah I have, before i installed pbis and introduce lsass.so the slurm module
worked well Is there anyway to debug?
I am seeing in syslog that the slurm module is adopting into the job
Hello,
it's a little inconvenient that the title tag of all SLURM doc pages only says
"Slurm Workload Manager". I usually have tabs to many SLURM doc pages
open and it's difficult to differentiate between them all.
Would it be possible to change the tab title to the page title for the doc
Hola Gestió,
You can have a look at: https://slurm.schedmd.com/core_spec.html and the
"CoreSpecCount" or "CpuSpecList" of the slurm.conf file.
Regards,
Carlos
On Fri, Feb 22, 2019 at 6:50 AM Chris Samuel wrote:
> On Thursday, 21 February 2019 1:00:52 PM PST Sam Hawarden wrote:
>
> > Linux
16 matches
Mail list logo