Hi all, We've been facing problems with the accounting module of SLURM:
There are big differences between the used CPU time given by 'sreport' and the sum of the jobs times given by 'sacct'. Searching for an explanation, we check the database and compare all the data; sreport, sacct and the raw data from the db. The information from the database and sacct are slightly different, but sreport always differs a lot; sometimes much higher, other much lower. i.e. sreport cluster UserUtilizationByAccount cluster=leftraru user=username start=09/30/16T00:00:00 end=03/30/17T00:00:00 format=Login,Used -t Hours -n username: 47436 Hours sacct -T -P -n -u username -S 09/30/16T00:00:00 -E 03/30/17T00:00:00 -o CPUTIMERAW | awk '{ SUM += $1} END { print SUM/3600}’ username: 5670 Hours On the database, we found that some tasks does not have an EndTime. We find on the documentation the 'sacctmgr show RunAwayJobs' command to solve this problem, but in the slurm 17.02.1-2 version it does not work; many tasks still does not have an end time. This is the main reason why we suspect the notorious difference between sreport and sacct. But the strange things does not end here, some tasks does not have StartTime. The database is a mess. Anyone have an idea about this?. Thanks in advance, --------------------------------------------------------------------- Logo NLHPC Equipo de Soporte del NLHPC | <!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET -->sopo...@nlhpc.cl National Laboratory for High Performance Computing (NLHPC) | <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET -->www.nlhpc.cl Center for Mathematical Modeling (CMM) School of Engineering and Sciences. University of Chile Beauchef 851, 7th Floor Office: +56-2-29784603 <!-- tmpl_var LEFT_BRACKET -->1<!-- tmpl_var RIGHT_BRACKET --> mailto:sopo...@nlhpc.cl <!-- tmpl_var LEFT_BRACKET -->2<!-- tmpl_var RIGHT_BRACKET --> http://www.nlhpc.cl