Dear Slurm developers,

please find below a patch for a slurmctld bug that results in incorrect 
reservation ids after the restart or reconfiguration of slurmctld if a 
reservation with a specific name pattern has been created previously. This may 
(I have not verified that but consider it to be very likely) result in sacct 
reporting incorrect reservation names for jobs when the reservation id wraps 
around.
The patch applies to the current HEAD. We originally found this problem with 
version 14.03.4 to which the identical patch applies. All other versions in 
between are most likely affected as well.

Thank you for your consideration.

Best regards,
Dorian



From 321b6998aa7ecd7bae10e5efbaf7a2bcf309dca4 Mon Sep 17 00:00:00 2001
From: Dorian Krause <[email protected]>
Date: Thu, 7 Jan 2016 13:37:15 +0100
Subject: [PATCH] slurmctld/reservation.c: Fix bug in computation of top_suffix

The top_suffix variable is used to store and update the reservation id when 
creating
new reservations. As the name suggests the top_suffix value is also used as a 
suffix
for reservations without user-provided name. In _validate_all_reservations() the
top_suffix is updated after a restart or reconfiguration based on the suffices 
of
existing reservations. This logic does not take into account the possibility of 
users
specifying reservation names that end with "_[0-9]+". If users create such a
reservation this will result in bogus top_suffix values and thus bogus (e.g.,
non-unique) reservation ids.
Instead of retrieving the reservation id from the name it is faster and easier 
to
use the resv_id member of slurmctld_resv_t.

Prior to the commit:

> cat p_top_suffix
p top_suffix
quit
> scontrol create reservation=test1 nodes=node1 start=now duration=1 users=root
Reservation created: test1
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 1
A debugging session is active.

        Inferior 1 [process 1166] will be detached.

> scontrol create reservation=test2 nodes=node1 start=now+60 duration=1 
> users=root
Reservation created: test2
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 2
A debugging session is active.

        Inferior 1 [process 1166] will be detached.

> scontrol create reservation=test3_100 nodes=node1 start=now+120 duration=1 
> users=root
Reservation created: test3_100
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

        Inferior 1 [process 1166] will be detached.

> scontrol reconfigure
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 100
A debugging session is active.

        Inferior 1 [process 1166] will be detached.

With the fix:

> scontrol create reservation=test1 nodes=node1 start=now duration=1 users=root
Reservation created: test1
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 1
A debugging session is active.

        Inferior 1 [process 5854] will be detached.

> scontrol create reservation=test2 nodes=node1 start=now+60 duration=1 
> users=root
Reservation created: test2
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 2
A debugging session is active.

        Inferior 1 [process 5854] will be detached.

> scontrol create reservation=test3_100 nodes=node1 start=now+120 duration=1 
> users=root
Reservation created: test3_100
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

        Inferior 1 [process 5854] will be detached.

> scontrol reconfigure
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x00000039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

        Inferior 1 [process 5854] will be detached.
---
 src/slurmctld/reservation.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/slurmctld/reservation.c b/src/slurmctld/reservation.c
index 9d9107a..41258d2 100644
--- a/src/slurmctld/reservation.c
+++ b/src/slurmctld/reservation.c
@@ -3301,8 +3301,6 @@ static void _validate_all_reservations(void)
        ListIterator iter;
        slurmctld_resv_t *resv_ptr;
        struct job_record *job_ptr;
-       char *tmp;
-       uint32_t res_num;

        iter = list_iterator_create(resv_list);
        while ((resv_ptr = (slurmctld_resv_t *) list_next(iter))) {
@@ -3314,11 +3312,7 @@ static void _validate_all_reservations(void)
                        list_delete_item(iter);
                } else {
                        _set_assoc_list(resv_ptr);
-                       tmp = strrchr(resv_ptr->name, '_');
-                       if (tmp) {
-                               res_num = atoi(tmp + 1);
-                               top_suffix = MAX(top_suffix, res_num);
-                       }
+                       top_suffix = MAX(top_suffix, resv_ptr->resv_id);
                }
        }
        list_iterator_destroy(iter);
--
2.4.3



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Reply via email to