We track our SGE cluster statistics through the accounting file, which we 
import into
a separate mySQL.

We've had some really strange results show up for job CPU time usage recently:

mysql> SELECT owner, job_number AS job_num, CPU/3600 AS "CPU", 
ru_wallclock/3600 AS "RT", FROM_UNIXTIME(start_time) AS strt_time, 
FROM_UNIXTIME(end_time), FROM_UNIXTIME(submission_time) AS sbmt_time, 
(CPU/3600) FROM csclprd3 WHERE (start_time >= UNIX_TIMESTAMP('2015-07-01')) AND 
(start_time < UNIX_TIMESTAMP('2015-08-01')) AND (start_time <> end_time) AND 
(start_time <> 0) AND (end_time <> 0) AND ((CPU/3600) > 
(6*(ru_wallclock/3600))) AND owner='pangjx';
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+
| owner | job_num | CPU         | RT      | start_time         | 
FROM_UNIXTIME(end_time) | sbmt_time            | (CPU/3600)  |
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+


| pangjx |  143320 | 2777777.7775 | 145.3114 | 2015-07-23 21:56:51 | 2015-07-29 
23:15:32          | 2015-07-23 21:56:40 | 2777777.7775 |
| pangjx |  154178 |       7.8869 |   0.7439 | 2015-07-29 15:02:28 | 2015-07-29 
15:47:06          | 2015-07-29 15:02:18 |       7.8869 |
| pangjx |  154265 |       7.4861 |   0.7106 | 2015-07-29 15:02:29 | 2015-07-29 
15:45:07        | 2015-07-29 15:02:23 |       7.4861 |
| pangjx |  154244 |       5.0086 |   0.6397 | 2015-07-29 15:02:28 | 2015-07-29 
15:40:51         | 2015-07-29 15:02:23 |       5.0086 |
| pangjx |  154196 |      10.0081 |   0.6386 | 2015-07-29 15:02:28 | 2015-07-29 
15:40:47        | 2015-07-29 15:02:19 |      10.0081 |
| pangjx |  154136 |       5.1428 |   0.6375 | 2015-07-29 15:02:28 | 2015-07-29 
15:40:43        | 2015-07-29 15:02:17 |       5.1428 |
| pangjx |  154217 |       5.2989 |   0.5658 | 2015-07-29 15:02:28 | 2015-07-29 
15:36:25        | 2015-07-29 15:02:19 |       5.2989 |
| pangjx |  154233 |       4.3808 |   0.5581 | 2015-07-29 15:02:28 | 2015-07-29 
15:35:57         | 2015-07-29 15:02:22 |       4.3808 |
| pangjx |  154157 |       5.4767 |   0.5517 | 2015-07-29 15:02:28 | 2015-07-29 
15:35:34          | 2015-07-29 15:02:18 |       5.4767 |
| pangjx |  154152 |       3.1375 |   0.4356 | 2015-07-29 15:02:28 | 2015-07-29 
15:28:36         | 2015-07-29 15:02:18 |       3.1375 |
| pangjx |  143359 | 2777777.7775 | 127.8125 | 2015-07-23 21:56:52 | 2015-07-29 
05:45:37         | 2015-07-23 21:56:42 | 2777777.7775 |
| pangjx |  143334 | 2777777.7775 | 123.9389 | 2015-07-23 21:56:51 | 2015-07-29 
01:53:11         | 2015-07-23 21:56:41 | 2777777.7775 |
| pangjx |  143329 |     945.1042 | 115.6944 | 2015-07-23 21:56:51 | 2015-07-28 
17:38:31         | 2015-07-23 21:56:41 |     945.1042 |
| pangjx |  143355 |     766.4269 | 100.3900 | 2015-07-23 21:56:52 | 2015-07-28 
02:20:16         | 2015-07-23 21:56:42 |     766.4269 |
| pangjx |  143377 | 2777777.7775 |  99.1744 | 2015-07-23 21:56:52 | 2015-07-28 
01:07:20         | 2015-07-23 21:56:43 | 2777777.7775 |

For a job to have CPU time usage statistics that are 27777 times greater than 
the runtime of the job is impossible isn't it?

Our cluster has nowhere near 27000 cores (even with hyperthreading).

-Bill L.
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to