nfirm
if this was
happening before we upgraded from 15.08.4 to 16.05.10-2.
Thanks,
John DeSantis
ely
don't want compute
nodes to start feeling memory pressure, leading to swapping.
HTH,
John DeSantis
Sema Atasever wrote:
> Hi Slurm-Dev,
>
> I have a *large dataset* stored as a text file. Consists of two separate
> files (test and train)
>
> I am running int
the list so that
others are aware - and
that SLURM is still stable!
Thanks,
John DeSantis
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Loris,
>> Does any one know whether one can run multinode MATLAB jobs with Slurm
I completely missed the _multinode_ part. Feel free to ignore, and sorry to
all for the noise in
the list!
John DeSantis
John DeSantis wrote:
>
that
a pool is already open.
[0] Nodes in our cluster depending on their age have between 12-24 processors
available. If a
user wants a parpool of 24, they must request either a constraint or a
combination of -N 1 and
- --ntasks-per-node=24, for example.
HTH,
John DeSantis
Loris Bennett wr
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
David,
Are you running any epilog functions that may be placing the nodes into a
drained/draining state?
John DeSantis
Baker D.J. wrote:
> Hello,
>
> I've recently started using slurm v17.02.2, however something seems very od
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Kaizaad,
I hate to say it, but I cannot _believe_ I never saw this detail in
the man page.
This information is extremely useful!
John DeSantis
On 10/26/2016 09:49 AM, Kaizaad Bilimorya wrote:
>
> Hi Chris,
>
> One way is to u
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Christopher,
Yes, it does restart - but that's how we've configured logrotate.
John DeSantis
On 09/28/2016 07:55 PM, Christopher Samuel wrote:
>
> On 29/09/16 01:16, John DeSantis wrote:
>
>> We get the same snippet
g
[2016-09-22T03:16:01.217] Terminate signal (SIGINT or SIGTERM) received
HTH,
John DeSantis
On 09/27/2016 07:38 PM, Christopher Samuel wrote:
>
> On 26/09/16 17:48, Philippe wrote:
>
>> [2016-09-26T08:02:16.582] Terminate signal (SIGINT or SIGTERM)
>> received
>
>
hesitate to
> share !
What I would do is perform a restart of slurm using the "postrotate"
command below, but remove the "--quiet" and ">/dev/null", and prefix
"time" to it, e.g.:
time /usr/sbin/invoke-rc.d slurm-llnl reconfig
This way you'll be
lost any jobs due to:
* ctld restarts
* typos in slurm.conf (!!)
* upgrades
I've been especially guilty of typos, and FWIW SLURM has been
extremely forgiving.
HTH,
John DeSantis
On 09/26/2016 03:46 AM, Philippe wrote:
> Hello everybody, I'm trying to understand an issue with 2 SL
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Hello,
Have you looked at the "slurm/slurm.h" file?
Some of the information present in that DB table correlates to the
code that is present.
HTH,
John DeSantis
On 09/23/2016 03:15 AM, Lachlan Musicman wrote:
> Is there a descri
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Emily,
What version of SLURM are you running?
We are running version 15.08.4 and have just run into the same issue.
There was a bug report filed [1], and it states that the issue was
corrected in version 14.08.11.
Thanks,
John DeSantis
[1
le parameters for
slurm.conf, this one escaped me!
Thanks,
John DeSantis
On 09/18/2016 07:37 PM, Christopher Samuel wrote:
>
> On 18/09/16 03:45, John DeSantis wrote:
>
>> Try adding a "DefMemPerCPU" statement in your partition
>> definitions, e.g
>
> You can
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Balaji,
Try adding a "DefMemPerCPU" statement in your partition definitions, e.g
.:
PartitionName=PY34 Nodes=okdev1368 DefMemPerCPU=512 MaxTime=INFINITE
State=UP shared=force:4
HTH,
John DeSantis
On 09/16/2016 04:44 PM, Balaji De
RIPT) -ge 1 ]; then echo
> "insert into ss_usage (user_name,recorded_on,jobID) values
> ('$SLURM_JOB_USER',NOW(),'$SLURM_JOB_ID')"|mysql -u slurm
> --password='' -D vasp_track -h fi
>
> exit 0
HTH,
John DeSantis
On 09/14/2016 01:44 PM, R
Chris,
Thanks for the second set of eyes!
John DeSantis
On 04/19/2016 07:59 PM, Christopher Samuel wrote:
>
> On 16/04/16 21:51, John DeSantis wrote:
>
>> Anyways, we have experienced a random(?) slurmctld failure resulting in
>> a segfault twice this week.
>
>
e some references to "packmem (valp=0x0" on the bugs.schedmd.com
site, and bug 2453 seems oddly familiar, although the tres format
strings are properly populated in both instances.
Thanks in advance for any information!
John DeSantis
On 04/16/2016 07:50 AM, John DeSantis wrote:
Hello,
g
to bet that I've overlooked something.
Thanks!
John DeSantis
d be a mix of running
and suspended jobs based upon (a) job priorities, and (b) partition
priorities; maybe check the controller logs for preemption notices to
confirm or deny this thought? At any rate, I'd suggest only using
preemption based upon QOS.
HTH,
John DeSantis
On 04/14/2016 0
ting the parameter to be
"Shared=FORCE:1"? The documentation states:
"For example, a configuration of Shared=FORCE:1 will only permit one job
per resources normally,".
John DeSantis
On 04/08/2016 02:03 PM, Wiegand, Paul wrote:
>
> This is *almost* what I want, but n
Paul,
Try changing the Partition "Shared=FORCE" statement to "Shared=NO".
We do that on all of our partitions and get the desired behavior.
John DeSantis
On 04/08/2016 01:06 PM, Wiegand, Paul wrote:
>
> Greetings,
>
> I would like to have our cluster configu
uent, supply the "--cnf=" flag which points to the
hosts file.
John DeSantis
On 04/08/2016 09:07 AM, David Grasselt wrote:
> Dear respected SLURM User/Developper,
>
> I am contacting you because of having trouble getting Fluent/ANSYS 17 to
> work with SLURM 15.08.5.
> I t
controller was online, changes were mode to the /etc/hosts
file and the controller was not able to resolve the proper address
afterwards. After we corrected the addressing issue, and restarted
slurmctld, the issue was corrected.
John DeSantis
On 04/07/2016 11:38 PM, Naajil Aamir wrote:
> Hi th
use "squeue -w " which gives
us all corresponding jobs running on the host(s) in question. It is
actually quite useful if you live on the command line.
Usiamo solo i tool da console. C'è un'altro tool da vedere: "sview".
Non ho mai usato "slurmtop", percio "sview" non potrebbe essere utile a te.
John DeSantis
example a node can be associated with two Slurm partitions (e.g.
"cpu" and "gpu") and the partition/queue "cpu" could be limited to
only a subset of the node’s CPUs, insuring that one or more CPUs would
be available to jobs in the "gpu" partition/qu
f an empty task_id_bitmap.
John DeSantis
2016-01-26 20:05 GMT-05:00 Andrus, Brian Contractor :
> John,
>
>
>
> Thanks. That seemed to help; a job started on a node that had a job on it
> once the job that had been on it (‘using’ all the memory) completed.
>
>
>
&g
Brian,
Try setting a default memory per CPU in the partition definition. Later
versions of SLURM (>= 14.11.6?) require this value to be set, otherwise all
memory per node is scheduled.
HTH,
John DeSantis
2016-01-26 15:20 GMT-05:00 Andrus, Brian Contractor :
> All,
>
>
>
> I
Chris,
Could you enable the Gres debugging via the DebugFlags and post the
relevant output?
It would be interesting to see what the logs state concerning what Gres
types have been found on the node in question.
John DeSantis
2016-01-22 12:31 GMT-05:00 Chris Paciorek :
>
> Hi John, w
ockets=2 RealMemory=32073
Feature="" Gres=gpu:2 Weight=1000
# gres.conf
NodeName=racka-[1-8] Name=gpu File=/dev/nvidia0
NodeName=rackb-[1-10,19-29] Name=gpu File=/dev/nvidia[0-1]
John DeSantis
2016-01-21 23:21 GMT-05:00 Chris Paciorek :
>
> Whoops, there was a bug in my po
Chris,
Try using "--pty /bin/bash" to get a shell, and see if that helps.
John DeSantis
On Jan 21, 2016 5:47 PM, "Chris Paciorek" wrote:
>
> We've been trying out the use of gres to control access to our GPU. It
> works fine for a batch submission but wh
ences some nodes), and unfortunately, the node in question resolved to
two different IP addresses; jobs could get dispatched, but would never
register a completion with the controller.
John DeSantis
On Jan 18, 2016 5:08 AM, "Danny Rotscher"
wrote:
> Hello,
>
> since we upgrad
Andrew,
Our database has roughly 3.4 million rows in the entire schema (including a
view, but we also purge job records after 6 months).
After the slurmdbd was upgraded, it took ~9 minutes (manifested by the
changes being performed in the log) before the daemon was active again.
HTH,
John
would run
> 1 element on special. Would it then use public for the other 3 elements
> (provided public has some idle nodes)?
As long as the special partition is idle, I'd assume that the
"special" partition would take as many jobs as possible and then
dispatch the remainin
Ryan,
I believe this is the default behavior of reservations unless the flag
"static_alloc" is specified.
John DeSantis
2015-11-21 22:13 GMT-05:00 Novosielski, Ryan :
> I could have sworn that I just heard it was possible to create a floating
> reservation for any number of no
he "public" partition and run
there:
"--partition=special,public"
I believe method would allow the project the best use of the system
resources without needing to utilize a reservation or preemption
(currently).
HTH!
John DeSantis
2015-11-21 11:29 GMT-05:00 Daniel Letai :
&g
hen not in use the hardware would be idle and unavailable to
other users.
John DeSantis
2015-11-19 13:31 GMT-05:00 Daniel Letai :
>
> The other issue is how to define the "public" partition. It would also have
> to float, with lower priority, or else how would you achieve exclu
Taras,
We see this message when a scheduled node has experienced an issue with
slurmd and/or munge, and can no longer accept jobs.
You can use 'scontrol release job_id' to reschedule the job. Please note
though, that 'job_id' js the actual job number reported in squeue.
Joh
Will,
It isn't mentioned, and this should probably be answered by the developers,
but do you know if this bug contributes to the MaxJobCount value being too
high?
Thanks!
John DeSantis
2015-08-10 11:31 GMT-04:00 John Desantis :
> Will et al,
>
> Thanks!
>
> I didn'
Will et al,
Thanks!
I didn't see any mention of this within (quick searches) via the mailing
lists, so apologies to all for unintended noise.
John DeSantis
2015-08-10 11:28 GMT-04:00 Will French :
> Yep, this was a bug that was fixed in 14.11.6. See:
>
> http://bugs.schedmd.co
scriptandoutput-1.txt
[2]
http://s3.enemy.org/~mrfusion/client_snippets/squeue_scriptandoutput-2.txt
Thank you,
John DeSantis
messages.
As far as the automated submissions go, we haven't yet run into a
similar situation. We did get a few users submitting jobs via
scripts, but we targeted them using a QOS (MaxCPUs & MaxSubmitJobs) to
control their behavior.
John DeSantis
2015-07-14 11:42 GMT-04:00 Char
like a solid target; sadly, I completely neglected to
verify via the source how the reservation states were handled via
slurmctld (apologies, Bruce!).
Our reservations have worked 99% of the time in 14.x (we started with
14.03.2-2, then upgraded to 14.03.6, and now 14.11.3).
Maybe one of the deve
7;ve
re-created all reservations and there is still unintentional
overlapping occurring, I'd recommend looking at the reservation
table(s) within the DB and possibly truncating it(them).
John DeSantis
2015-07-06 16:26 GMT-04:00 Bill Barth :
>
> On 7/6/15, 2:08 PM, "John Desantis&qu
o delete and re-create the affected reservations?
John DeSantis
2015-07-06 14:49 GMT-04:00 Bill Barth :
>
> John,
>
> Thanks for your suggestion, but I think I must have miscommunicated. I
> don't want the reservations to overlap, so I want to figure out how to
> preven
tions ensuring that the "OVERLAP"
flag is present.
The reason I have suggested #1 is because in our case I didn't want
any long running jobs to land on the nodes while re-creating the
reservation (we're using the same set of nodes), further causing grief
for the reservation use
s (specifically cores), and
with accounting you can apply limits per user or as a whole for a
group (account).
John DeSantis
2015-06-11 10:12 GMT-04:00 Martin, Eric :
> Is there a way for users to self limit the number of jobs that they
> concurrently run?
>
> Eric Martin
> Center
array
tasks) or they can use a matlab pool up to either 12 to 16 cores on 1
node for SMP jobs.
Integration scripts aren't needed in this set-up. All that is
required is a normal submission script.
John DeSantis
2015-06-09 17:24 GMT-04:00 Hadrian Djohari :
> Hi,
>
> We are in the pro
t;?
Those would be the options that I'd immediately try to begin
trouble-shooting the issue.
John DeSantis
2015-06-02 14:19 GMT-04:00 Paul van der Mark :
>
> All,
>
> We are preparing for a switch from our current job scheduler to slurm
> and I am running into a strange issue.
on from 8 to 2 and I'd also remove the "Count=" because
specifying "File=" is enough for Slurm (with what I've seen).
I should also add that I'm running Slurm 14.11.3, so without researching
the changelogs, I cannot comment if there were changes made to Gres cod
Daniel,
Ok, at this point I'd suggest enabling the DebugFlags=Gres in your
slurm.conf and turning up the SlurmctldDebug level to debug. You could
also change SlurmdDebug to a higher debug level as well. There may be some
clues in the extra output.
John DeSantis
2015-05-06 16:57 GMT-
Daniel,
Use the same gres.conf on all nodes in the cluster (including the
controller), and then restart slurm and try again.
John DeSantis
On May 6, 2015 4:22 PM, "Daniel Weber"
wrote:
>
> Hi John,
>
> I added the types into slurm.conf and the gres.conf files on the node
Daniel,
I hit send without completing my message:
# gres.conf
NodeName=blah Name=gpu Type=Tesla-T10 File=/dev/nvidia[0-1]
HTH.
John DeSantis
2015-05-06 15:30 GMT-04:00 John Desantis :
> Daniel,
>
> You sparked an interest.
>
> I was able to get Gres Types working by:
>
&g
salloc: job 532507 queued and waiting for resources
# slurm.conf
Nodename=blah CPUs=16 CoresPerSocket=4 Sockets=4 RealMemory=129055
Feature=ib_ddr,ib_ofa,sse,sse2,sse3,tpa,cpu_xeon,xeon_E7330,gpu_T10,titan,mem_128G
Gres=gpu:Tesla-T10:2 Weight=1000
# gres.conf
2015-05-06 15:25 GMT-04:00 John Desantis
Daniel,
"I can handle that temporarily with node features instead but I'd
prefer utilizing the gpu types."
Guilty of reading your response too quickly...
John DeSantis
2015-05-06 15:22 GMT-04:00 John Desantis :
> Daniel,
>
> Instead of defining the GPU type in our Gr
are being seen correctly on a node by the controller. I
also wonder if using a cluster wide Gres definition (vs. only on nodes
in question) would make a difference or not.
John DeSantis
2015-05-06 15:12 GMT-04:00 Daniel Weber :
>
> Hi John,
>
> I already tried using "Count=1
Daniel,
What about a count? Try adding a count=1 after each of your GPU lines.
John DeSantis
2015-05-06 11:54 GMT-04:00 Daniel Weber :
>
> The same "problem" occurs when using the grey type in the srun syntax (using
> i.e. --gres=gpu:tesla:1).
>
> Regards,
&g
Daniel,
We don't specify types in our Gres configuration, simply the resource.
What happens if you update your srun syntax to:
srun -n1 --gres=gpu:tesla:1
Does that dispatch the job?
John DeSantis
2015-05-06 9:40 GMT-04:00 Daniel Weber :
> Hello,
>
> currently I'm trying
t I'd definitely try it out
and see what kind of results you get.
John DeSantis
2015-05-01 12:41 GMT-04:00 Will French :
>
>
>
>>
>> If you use modules, perhaps you could detect when the module is loaded from
>> a gateway and not set I_MPI_PMI_LIBRARY there. If yo
To all involved in this thread,
Thank you very much for your pointers and suggestions!
John DeSantis
2015-04-16 1:07 GMT-04:00 Christopher Samuel :
>
> On 16/04/15 14:43, Bill Barth wrote:
>
>> That's what I sent John off-list. Wasn't sure self-promotion was OK here.
There
will be no job modifications at all.
John DeSantis
2015-04-14 19:47 GMT-04:00 Christopher Samuel :
>
> On 15/04/15 08:16, David Bigagli wrote:
>
>> Using scontrol to get the command parameter is probably not recommended
>> as that is the path inside the user directory
pool/slurmd/job$SLURM_JOB_ID.
Do the developers and/or community as a whole see anything wrong with
this method?
John DeSantis
2015-04-14 12:46 GMT-04:00 John Desantis :
>
> Chris,
>
> Thanks for you reply. I've definitely vetted the URL several times
> over the last two day
d accomplish what I was looking to do.
John DeSantis
2015-04-14 11:04 GMT-04:00 Christopher B Coffey :
> Hi John,
>
> Have you looked into creating a LUA job submission script? You can
> manipulate the job script before it begins execution. There are also this
> if you
velopers then would be - is there a way with Slurm's current "out
of the box" capabilities to parse a job submission script once it
lands on a batch host?
Thanks!
John DeSantis
2015-04-13 10:24 GMT-04:00 John Desantis :
> Hello all!
>
> I've been doing some test
espite the file actually being there. I've
even set extended ACL's on the directory so that the SlurmUser can see
all of the files (sudo -u slurm ls -lR failed with permission denied).
Could anyone tell me why the slurm_script file cannot be read via prolog?
Thank you!
John DeSantis
Carl,
I'd suggest explicitly setting a PATH in the script and also using "&"
to put the job in the background (via cron):
* * * * * /path/to/some/binary /path/to/some/script &
John DeSantis
2015-04-02 4:22 GMT-04:00 Inigo Aldazabal Mensa :
>
> On Wed, 01 Apr
Felix,
How does the routing table look on the controller?
Is the IB network listed on the controller using the correct interface?
John DeSantis
2015-03-19 10:48 GMT-04:00 Felix Willenborg :
>
> So i tried out installing the latest package (14.11.4-1) of slurm with no
> success - unfo
Felix,
Do the IP addresses associated with the NodeName's return proper matches
when you run lookups?
What happens if you don't use IP addresses and only host names within your
Slurm configuration?
John DeSantis
2015-03-17 11:30 GMT-04:00 John Desantis :
> Felix,
>
> My
Felix,
My fault, I suggested something that you already checked!
John DeSantis
2015-03-17 11:28 GMT-04:00 John Desantis :
> Felix,
>
> Can you ping the nodes from the controller and vise versa?
>
> The snippet below looks like a potential firewall issue:
>
> [2015-03-16
de on port 6818 and then
telnet'ing from each node to the controller on port 6817.
John DeSantis
2015-03-17 11:23 GMT-04:00 Yann Sagon :
>
>
> 2015-03-17 13:31 GMT+01:00 Felix Willenborg <
> felix.willenb...@uni-oldenburg.de>:
>
>>
>> Hi there,
>>
>&g
many - the job would
have been rejected with a message indicating that the partition limits
have been breached. But hey, that's what happens when you answer
emails ~20 minutes after waking up!
John DeSantis
2015-03-13 7:55 GMT-04:00 Uwe Sauter :
>
> Hi,
>
> thanks for looking into t
r espresso
soon enough and will reply if anything else comes to mind. I hope
this helps!
John DeSantis
2015-03-12 4:59 GMT-04:00 Uwe Sauter :
>
> No one able to give a hint?
>
> Am 10.03.2015 um 17:05 schrieb Uwe Sauter:
>>
>> Hi,
>>
>> I have an account "prod
Tejas,
Can you ping the nodes via their hostnames listed in the slurm.conf file?
Can you ping the contoller(s) from the nodes?
Is there a firewall running on any of the nodes in question?
John DeSantis
2015-02-12 17:38 GMT-05:00 Novosielski, Ryan :
> Check the logs. You'll likel
shoes, I'd disable the "NO_CONF_HASH" DebugFlags value,
turn up debugging verbosity, and double-check that all nodes in your
cluster have the same slurm.conf. I would then restart all of the slurm
daemons and then restart the slurmctl daemon on your controller(s).
John DeSantis
Moe,
Thank you!
John DeSantis
2015-02-03 15:28 GMT-05:00 :
>
> I believe that is OK, but don't know off the top of my head. Either test for
> youself or
> 1. add the new partition
> 2 move pending jobs to the new partition
> 3. Delete the other partitions once their
Moe,
One last question when you have a chance.
If there are running jobs with active partitions now and we switch to
one partition with a topology, would those running jobs be lost?
Thank you!
John DeSantis
2015-02-03 12:42 GMT-05:00 :
>
> You can configure a single queue and u
Uwe,
I do have features assigned at the moment, but from what I've read
that would require prolog scripting or user re-education.
It looks like the lowest hanging fruit is the topology.conf option.
Thanks,
John DeSantis
2015-02-03 12:46 GMT-05:00 Uwe Sauter :
>
> Might be worth t
Moe,
Thanks!
John DeSantis
2015-02-03 12:42 GMT-05:00 :
>
> You can configure a single queue and use the topology/tree plugin to
> identify the nodes on separate fabrics.
>
>
> Quoting John Desantis :
>>
>> Hello all,
>>
>> Unfortunately, I have
multiple partition definitions with a DEFAULT clause
or not. I've looked at the topology/tree plugin as well and seeing
that you can specify either switches or nodes, if this would be the
preferred method to achieve 1 "global" partition which utilizes all of
the separate hardware p
html
If I restart the slurmctld, the default QOS is respected and can be
verified by squeue. I'd recommend trying that instead of the
modifications to see if you get the expected results or not.
I saw this on 14.03.6 and with 14.11.3
John DeSantis
2015-02-03 4:12 GMT-05:00 "Dr. Markus S
nfigured the startup script in init.d so that there are ulimit values set
when the daemon starts too.
John DeSantis
2015-01-28 17:25 GMT-05:00 Trey Dockendorf :
> John,
>
> Thanks for the response. We use PropagateResourceLimits=NONE and also set
> both hard and soft for memlock t
ropriate ulimit
value; that you're not enforcing memory improperly.
Also, this is more related to VASP than Slurm, but I've seen VASP
segfault from start if the input isn't correct in terms of CPU's
requested versus what's in INCAR (NPAR/KPAR).
Thanks,
John DeSantis
2015-01
ulted in no finds.
As it turns out, all that needed to be done was upgrade the slurm-sql plugin.
I then carried out the rest of the upgrade and no jobs were lost.
I hope others find this information useful.
John DeSantis
have the "DefaultStorage*" variables set in addition to
"AccountingStorageEnforce".
Thank you,
John DeSantis
Moe,
Thanks for the clarification!
John DeSantis
2014-11-10 13:45 GMT-05:00 :
>
> If a system administrator manually starts a daemon, there is no telling what
> it's PATH might be. For security reasons, you'll need to either explicitly
> set a PATH environment variabl
is there a SLURMD option which inherits the
user's environment who is running the script(s)?
Thank you,
John DeSantis
Chris,
> Hmm, could you try and mark a partition as UP with scontrol and see if
> that helps? It's something we do here on Slurm 2.6 and (I believe)
> resolves this for us.
Thanks for the suggestion!
I tried this and unfortunately, there was no change.
John DeSantis
2014-10-1
Hello all,
Just an update to this issue.
If I restart the primary slurmctld, I can avoid a service restart
across the cluster.
John DeSantis
2014-10-15 15:04 GMT-04:00 John Desantis :
>
> Hello all,
>
> I am not sure if I've stumbled upon a bug (14.03.6) or if this is the
&g
44 0 rack-5-[16-19]
The default qos is now respected.
Is a restart of the slurmd/slurmctld daemons necessary and just
undocumented or is this a potential bug?
Thank you,
John DeSantis
Danny,
We can wait for the production version. Thanks!
John DeSantis
2014-09-26 14:27 GMT-04:00 Danny Auble :
>
> Depending to what commit you upgrade to yes, anything in 14.03 is in 14.11.
> Right now I wouldn't suggest on running 14.11 in production since it is
> still
Danny,
Thank you for your response. We'll schedule an upgrade to address the issue.
Could you tell me if commit 6aadcf15355dfe (introduced in 14.03.4)
will still be present?
John DeSantis
2014-09-26 13:45 GMT-04:00 Danny Auble :
>
> John, this was fixed in 14
e real JobId "23383"
returned a result within sacct and the DB. I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.
I did find a previous post
https://www.mail-archive.com/slurm-dev@schedmd.com/msg03344.html w
nd
due to user error (mine!), I didn't configure them properly.
John DeSantis
2014-07-02 14:17 GMT-04:00 Michael Robbert :
> John,
> Did you find and read this thread from 2011 that appears to discuss this
> issue?
>
> http://comments.gmane.org/gmane.comp.distributed.slurm.dev
s' state to "IDLE". I'll make sure to review the
slurmd.log first before posting any more questions, should they arise!
John DeSantis
2014-07-02 14:09 GMT-04:00 E V :
>
> Did you check the slurmd.log on the node's and make sure the
> RealMemory for them on start up is l
CPUs=8 CoresPerSocket=4 Sockets=2
RealMemory=12929 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
NodeName=sanitized_hostname[2-3] CPUs=8 CoresPerSocket=4 Sockets=2
RealMemory=10909 Feature=ib_ddr,ib_ofa,sse4,sse41,tpa,cpu_xeon
Thanks for any help and/or insight!
John DeSantis
95 matches
Mail list logo