Hi!
You need 2 packages installed on the system(in my case it is a RHEL based
distro) where you build Slurm: hwloc and hwloc-devel. And also you don't need
the .rpmmacros for hwloc if you have installed these packages. By default this
option is enabled. ;) Btw, I have never used a custom hwloc
Hi!
For sure this is not connected to Slurm, but it is a problem with your
Infiband+IMPI configuration. You should go to other forums or mailing lists and
ask for help ;)
At first, I would suggest you to configure correctly the dat.conf file. In my case it is
"/etc/dat.conf". You have to comm
Thank you very much, Chrysovalantis. I just created a topic in Intel forum
though your suggestion did not fix our issue. I will also update this topic if
I get the solution in case other slurm users may have the similar issue again.
Thanks,
Tingyang Xu
From: Chrysovalantis Paschoulas
Sent: Mon
Had 2 jobs die yesterday morning with a slurm_load_jobs error:
Protocol authentication error from inside DRMAA, and this interesting
message in the log:
If munged is up, restart with --num-threads=10
error: Munge encode failed: Unable to access
"/var/run/munge/munge.socket.2": No such file or dir
Slurm already has connect retry logic (10 times with 0.1 sec between
retries). DRMAA should need no changes unless it directly accesses
munge.
Has anyone else seen this problem?
Quoting E V :
Had 2 jobs die yesterday morning with a slurm_load_jobs error:
Protocol authentication error fro
Looking at the DRMAA code it appears false was returned from calling
slurm_load_job( &job_info, fsd_atoi(self->job_id), SHOW_ALL), which
triggered the error output and stack dump. Haven't looked at the code
for slurm_load_job to see if it's doing anything different. I'm using
14.03.08, FYI.
On Mo
Hi all,
I have the intention of working on Slurm, modifying it to satisfy my needs
and (hopefully) include some new functionalities. I am however kind of
newbie with this kind of software development, so I am writing looking for
advise. My question is, can you recommend me any tools for the develo
Hi Manuel,
The first rule is "Keep it simple!"
I suggest that you start by viewing this as 2 problems:
1. Learning how to work with Slurm
2. Learning how to work with clusters
For learning how to work with Slurm, cloning a copy of the repo is a
good start. In the "Developers" note
I wouldn't count what I've done as production-ready but I have a Puppet
module for BLCR [1] and one for SLURM [2]. Also there's one for managing
SLURM QOS and clusters using native Puppet types [3]. They likely won't
aid in development as the two SLURM related modules both assume you have
build R
> "Manuel" == Manuel Rodríguez Pascual
> writes:
Hi Manuel,
Manuel> Hi all, I have the intention of working on Slurm, modifying
Manuel> it to satisfy my needs and (hopefully) include some new
Manuel> functionalities. I am however kind of newbie with this kind
Manuel> of
Trey,
I'm not sure why your jobs aren't starting. Someone else will have to
answer that question.
You can model an organizational hierarchy a lot better in 14.11 due to
changes in Fairshare=parent for accounts. If you only want fairshare to
matter at the research group and user levels but
11 matches
Mail list logo