These may be going into the void, but it happened again today. Here you can see a freshly created Claremont reservation picking up a similar node list to the PECAN reservation that is marked as DAILY.
[root@master.stampede]# scontrol show ReservationName=Claremont-Training-2015-06-26 ReservationName=Claremont-Training-2015-06-26 StartTime=2015-06-26T12:00:00 EndTime=2015-06-26T20:00:00 Duration=08:00:00 Nodes=c401-[001-004,101-104,201-204,301-304,401-402] NodeCnt=18 CoreCnt=288 Features=(null) PartitionName=normal-mic Flags= Users=bbarth,qux Accounts=(null) Licenses=(null) State=INACTIVE [root@master.stampede]# scontrol show ReservationName=PECAN-1km ReservationName=PECAN-1km StartTime=2015-06-25T10:00:00 EndTime=2015-06-25T14:00:00 Duration=04:00:00 Nodes=c401-[001-004,101-102,104,201-204,301-304,401-404,501-504,601,603-604 ,701-703,801-804,901-904],c402-[001-004,101-104,201-204,301-304,401-404,501 -504,601-604,701-702,704,801-804,901-904],c403-[001-004,101-104,201-204,301 -304,401,403-404,501-504,602-604,701-704,801-804,901-904],c404-[001-004,101 -104,201-204,301-304,401-404,501-504,601-603,701,703],c422-304 NodeCnt=144 CoreCnt=2304 Features=(null) PartitionName=normal-mic Flags=DAILY Users=bbarth,foo,bar,baz Accounts=(null) Licenses=(null) State=ACTIVE I'm 99% certain that PECAN will try to use these same nodes as it is right now and will therefore overlap with Claremont come this morning. It certainly did this morning with today's Claremont reservation: ReservationName=Claremont-Training-2015-06-25 StartTime=2015-06-25T12:00:00 EndTime=2015-06-25T19:00:00 Duration=07:00:00 Nodes=c401-[001-004,101-104,201-204,301-304,401-404,501-504,601-604,701-704 ,801-804] NodeCnt=36 CoreCnt=576 Features=(null) PartitionName=normal-mic Flags= Users=bbarth,qux Accounts=(null) Licenses=(null) State=INACTIVE I deleted and recreated that reservation just now, and it seems to have picked up different nodes from the active PECAN reservation. Does SLURM check against recurring reservations when picking nodes for new reservations that would overlap in the future due to recurrence? It doesn't seem to to me. Thanks, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu | Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 6/24/15, 1:35 PM, "Bill Barth" <bba...@tacc.utexas.edu> wrote: > >Thanks, Jackie. I sent the wrong Claremont reservation with the overlap >(se my update). I'm aware of the OVERLAP flag but we never set that. Thus >my confusion. See my update message when it hits the list. > >We did recently upgrade slurm from 2.6.3 to 14.11.3, and these >reservations existed before the upgrade. I wonder if something went weird >during the upgrade process. > >Bill. >-- >Bill Barth, Ph.D., Director, HPC >bba...@tacc.utexas.edu | Phone: (512) 232-7069 >Office: ROC 1.435 | Fax: (512) 475-9445 > > > > > > > >On 6/24/15, 12:00 PM, "Jacqueline Scoggins" <jscogg...@lbl.gov> wrote: > >>From looking at your list of nodes I dont see any node overlapping. The >>names are really long and it is somewhat confusing but in seeing these >>nodes they are not overlapping on the reservations. >> >> >>i.e. c401 has listed in the first reservation - node 001-004 and 101-102, >>201-204, 301-304, 401-404... >>and in the second reservation c401 only shows 103 which is not listed in >>the first reservation. >> >> >>I looked at all of these nodes and the names and I did not see any >>overlapping unless I am reading your node names wrong. >> >> >> >> >>Overlapping will only occur if you set the FLAG=OVERLAP >> OVERLAP This reservation can be allocated resources >>that are already in another reservation. >> >> >> >>Thanks >> >> >>Jackie >> >>On Wed, Jun 24, 2015 at 9:47 AM, Bill Barth >><bba...@tacc.utexas.edu> wrote: >> >> >>Can someone explain why these node lists might overlap? This is causing a >>lot of pain for the user trying to use the PECAN-1km reservation. I've >>tried recreating the other reservation, and it is grabbing nodes that >>should not be available to it: >> >>[root@master.stampede]# scontrol show res=PECAN-1km >>ReservationName=PECAN-1km StartTime=2015-06-24T10:00:00 >>EndTime=2015-06-24T14:00:00 Duration=04:00:00 >> >>Nodes=c401-[001-004,101-102,104,201-204,301-304,401-404,501-504,601,603-6 >>0 >>4 >>,701-703,801-804,901-904],c402-[001-002,004,101-104,201-204,301-304,401-4 >>0 >>4 >>,501-504,601-604,701-702,704,801-804,901-904],c403-[001-004,101-104,201-2 >>0 >>4 >>,301-304,401,403-404,501-504,602-604,701-704,801-804,901-904],c404-[001-0 >>0 >>4 >>,101-104,201-204,301-304,401-404,501-504,601-603,701-703],c422-304 >>NodeCnt=144 CoreCnt=2304 Features=(null) PartitionName=normal-mic >>Flags=DAILY >> Users=bbarth,qux Accounts=(null) Licenses=(null) State=ACTIVE >> >>[root@master.stampede]# scontrol show res=PECAN-1km^C >>(reverse-i-search)`cl': ~/bin/^Cose_queues.sh >>[root@master.stampede]# scontrol show res Claremont-Training-2015-06-24 >>ReservationName=Claremont-Training-2015-06-24 >>StartTime=2015-06-24T13:00:00 EndTime=2015-06-24T20:00:00 >>Duration=07:00:00 >> >>Nodes=c401-103,c403-402,c410-304,c411-[203,504,603],c412-303,c414-301,c41 >>5 >>- >>601,c416-003,c417-[603,704],c418-[402,404,903],c419-803,c420-[303,403] >>NodeCnt=18 CoreCnt=288 Features=(null) PartitionName=normal-mic Flags= >> Users=bbarth,foo,bar,baz Accounts=(null) Licenses=(null) >>State=INACTIVE >> >> >>Thanks, >>Bill. >>-- >>Bill Barth, Ph.D., Director, HPC >>bba...@tacc.utexas.edu | Phone: >>(512) 232-7069 <tel:%28512%29%20232-7069> >>Office: ROC 1.435 | Fax: (512) 475-9445 >><tel:%28512%29%20475-9445> >> >> >> >> >> >> >> >> >>