We track our SGE cluster statistics through the accounting file, which we
import into
a separate mySQL.
We've had some really strange results show up for job CPU time usage recently:
mysql> SELECT owner, job_number AS job_num, CPU/3600 AS "CPU",
ru_wallclock/3600 AS "RT", FROM_UNIXTIME(start_time) AS strt_time,
FROM_UNIXTIME(end_time), FROM_UNIXTIME(submission_time) AS sbmt_time,
(CPU/3600) FROM csclprd3 WHERE (start_time >= UNIX_TIMESTAMP('2015-07-01')) AND
(start_time < UNIX_TIMESTAMP('2015-08-01')) AND (start_time <> end_time) AND
(start_time <> 0) AND (end_time <> 0) AND ((CPU/3600) >
(6*(ru_wallclock/3600))) AND owner='pangjx';
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+
| owner | job_num | CPU | RT | start_time |
FROM_UNIXTIME(end_time) | sbmt_time | (CPU/3600) |
+---------+------------+---------------------+-------------+-----------------------------+-------------------------------------------+-------------------------------+-------------------+
| pangjx | 143320 | 2777777.7775 | 145.3114 | 2015-07-23 21:56:51 | 2015-07-29
23:15:32 | 2015-07-23 21:56:40 | 2777777.7775 |
| pangjx | 154178 | 7.8869 | 0.7439 | 2015-07-29 15:02:28 | 2015-07-29
15:47:06 | 2015-07-29 15:02:18 | 7.8869 |
| pangjx | 154265 | 7.4861 | 0.7106 | 2015-07-29 15:02:29 | 2015-07-29
15:45:07 | 2015-07-29 15:02:23 | 7.4861 |
| pangjx | 154244 | 5.0086 | 0.6397 | 2015-07-29 15:02:28 | 2015-07-29
15:40:51 | 2015-07-29 15:02:23 | 5.0086 |
| pangjx | 154196 | 10.0081 | 0.6386 | 2015-07-29 15:02:28 | 2015-07-29
15:40:47 | 2015-07-29 15:02:19 | 10.0081 |
| pangjx | 154136 | 5.1428 | 0.6375 | 2015-07-29 15:02:28 | 2015-07-29
15:40:43 | 2015-07-29 15:02:17 | 5.1428 |
| pangjx | 154217 | 5.2989 | 0.5658 | 2015-07-29 15:02:28 | 2015-07-29
15:36:25 | 2015-07-29 15:02:19 | 5.2989 |
| pangjx | 154233 | 4.3808 | 0.5581 | 2015-07-29 15:02:28 | 2015-07-29
15:35:57 | 2015-07-29 15:02:22 | 4.3808 |
| pangjx | 154157 | 5.4767 | 0.5517 | 2015-07-29 15:02:28 | 2015-07-29
15:35:34 | 2015-07-29 15:02:18 | 5.4767 |
| pangjx | 154152 | 3.1375 | 0.4356 | 2015-07-29 15:02:28 | 2015-07-29
15:28:36 | 2015-07-29 15:02:18 | 3.1375 |
| pangjx | 143359 | 2777777.7775 | 127.8125 | 2015-07-23 21:56:52 | 2015-07-29
05:45:37 | 2015-07-23 21:56:42 | 2777777.7775 |
| pangjx | 143334 | 2777777.7775 | 123.9389 | 2015-07-23 21:56:51 | 2015-07-29
01:53:11 | 2015-07-23 21:56:41 | 2777777.7775 |
| pangjx | 143329 | 945.1042 | 115.6944 | 2015-07-23 21:56:51 | 2015-07-28
17:38:31 | 2015-07-23 21:56:41 | 945.1042 |
| pangjx | 143355 | 766.4269 | 100.3900 | 2015-07-23 21:56:52 | 2015-07-28
02:20:16 | 2015-07-23 21:56:42 | 766.4269 |
| pangjx | 143377 | 2777777.7775 | 99.1744 | 2015-07-23 21:56:52 | 2015-07-28
01:07:20 | 2015-07-23 21:56:43 | 2777777.7775 |
For a job to have CPU time usage statistics that are 27777 times greater than
the runtime of the job is impossible isn't it?
Our cluster has nowhere near 27000 cores (even with hyperthreading).
-Bill L.
IMPORTANT WARNING: This message is intended for the use of the person or entity
to which it is addressed and may contain information that is privileged and
confidential, the disclosure of which is governed by applicable law. If the
reader of this message is not the intended recipient, or the employee or agent
responsible for delivering it to the intended recipient, you are hereby
notified that any dissemination, distribution or copying of this information is
strictly prohibited. Thank you for your cooperation.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users