We have /home exported to all systems via NFS and also a parallel filesystem (FhGFS/BeeGFS, but any will do) mounted for scratch space. I'm sure other sites do it differently, but this is one way of doing it.
Exporting /home via NFS is likely the most common approach. Another is exporting compiled applications via NFS that are loaded via modules (Lmod for example), at our site that is /apps, some do /opt, or some non-standard directory not typically used by the OS. - Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Tue, Dec 9, 2014 at 11:27 AM, Adrian Reich <[email protected]> wrote: > Thank you so much for that suggestion! That led me straight to the > issue. The file system that I have mounted to the head node, is not visible > to the compute nodes. All the jobs were failing because I was getting > streams of "No such file or directory" errors. If I launch a job from a > folder that is part of the OS, the job runs, because that same folder also > exists on the compute nodes. > > So, what is the best way for my compute nodes to write to the file system > that I have set up on the headnode? Thank you again. > > Sincerely, > Adrian Reich > > On Tue, Dec 9, 2014 at 11:57 AM, <[email protected]> wrote: > >> >> Look at your SlurmctldLogFile (on the head node) and SlurmdLogFile (on >> the allocated node). >> >> >> Quoting Adrian Reich <[email protected]>: >> >> Hello, >>> >>> I have set up a small SLURM cluster using the SLURM roll within Rocks. >>> Every time I try to submit an sbatch job it fails immediately and the job >>> quits. However, I can request resources using salloc and everything >>> works. >>> How can I go about diagnosing where the issue is and what information >>> can I >>> provide to help in the diagnosis? Thank you. >>> >>> Sincerely, >>> Adrian Reich >>> >> >> >> -- >> Morris "Moe" Jette >> CTO, SchedMD LLC >> > >
