After discovering issues with sreport and tracing back to Slurm DB orphans (running AND pending), working to resolve that problem and clearing DB or running and pending orphans I find that rollups are still not running automatically.
For example, I triggered re-rollup this morning by modifying the entries in the last_ran_table and restarting slurmdbd and rollups were redone from April 1, 2016 to today at 06:00 AM. [tcooper@cluster ~]# show_last_ran_table.sh +---------------------+---------------------+---------------------+ | hourly_rollup | daily_rollup | monthly_rollup | +---------------------+---------------------+---------------------+ | 2017-06-09 06:00:00 | 2017-06-09 00:00:00 | 2017-06-01 00:00:00 | +---------------------+---------------------+---------------------+ Three hours later and the hourly rollups for [07-09]:00 AM had not run. Restart of slurmdbd triggers the rollups which run successfully... [tcooper@cluster ~]# service slurmdbd restart stopping slurmdbd: [ OK ] slurmdbd (pid 30526) is running... slurmdbd (pid 30526) is running... slurmdbd (pid 30526) is running... slurmdbd (pid 30526) is running... starting slurmdbd: [ OK ] [tcooper@cluster ~]# tailf /var/log/slurm/slurmdbd.log | egrep -v "post user" ... [2017-06-09T09:22:16.553] debug: DBD_INIT: CLUSTER:cluster VERSION:7168 UID:513563 IP:10.21.2.3 CONN:9 [2017-06-09T09:22:16.644] debug: DBD_INIT: CLUSTER:cluster VERSION:7168 UID:513563 IP:10.21.2.3 CONN:8 [2017-06-09T09:24:07.700] Terminate signal (SIGINT or SIGTERM) received [2017-06-09T09:24:18.077] debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded [2017-06-09T09:24:19.190] Accounting storage MYSQL plugin loaded [2017-06-09T09:24:28.114] slurmdbd version 14.11.11 started [2017-06-09T09:24:28.115] 0(as_mysql_rollup.c:622) cluster curr hour is now 1497013200-1497016800 ... [2017-06-09T09:24:41.831] Warning: Note very large processing time from hourly_rollup for cluster: usec=13715990 began=09:24:28.115 [2017-06-09T09:24:41.831] 0(as_mysql_usage.c:376) query update "cluster_last_ran_table" set hourly_rollup=1497024000 [2017-06-09T09:25:16.746] debug: DBD_INIT: CLUSTER:cluster VERSION:7168 UID:513563 IP:10.21.2.3 CONN:9 [2017-06-09T09:25:16.839] debug: DBD_INIT: CLUSTER:cluster VERSION:7168 UID:513563 IP:10.21.2.3 CONN:8 ... [tcooper@cluster ~]# show_last_ran_table.sh +---------------------+---------------------+---------------------+ | hourly_rollup | daily_rollup | monthly_rollup | +---------------------+---------------------+---------------------+ | 2017-06-09 09:00:00 | 2017-06-09 00:00:00 | 2017-06-01 00:00:00 | +---------------------+---------------------+---------------------+ Before our DB orphan issue generation of rollups was NOT a problem. Can anyone provide insight into where the 'trigger' for hourly rollups is supposed to come from and/or know if this is a bug in Slurm 14.11.11 that is fixed in a later version? Thanks, Trevor Cooper HPC Systems Programmer San Diego Supercomputer Center, UCSD 9500 Gilman Drive, 0505 La Jolla, CA 92093-0505