Arnau Bria <[email protected]> writes:

> Hi all,
>
> In my todo list I had the accounting issue as one of my top tasks. I've
> read that the SGE tool for that purpose is ARCo and it's the one I'd
> install/configure. From the ARCo doc
> (http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/esqgq/index.html)
> I see that it does job logging. Does it mean that ARCo will trace all
> job status? from submitted to done/exited/aborted whatever?

It records a subset of the info from the reporting file
<https://arc.liv.ac.uk/SGE/htmlman/htmlman5/reporting.html>.  Currently
it specifically doesn't show the start of parallel job slave tasks,
which would be useful in case of crashes.

You'd probably only use the dbwriter component (binaries available from
the SGE downloads area) and query the database with another tool via
SQL.  See the top of
<https://arc.liv.ac.uk/repos/darcs/arco/www/index.html> concerning the
"reporting" component.  The database can get huge.

At one time there was a recipe for displaying the data in
Open^WStaroffice, but I couldn't find it again when I looked a while
ago.

> But before installing, I'm wondering if are there any other tools and
> which one is the most popular. I'm sure that most experinced SGE admins
> could give some feedback about accounting/logging applications (if they
> exist).

The other tools I know of are listed at
<https://arc.liv.ac.uk/SGE/tools.html>; Gold might be added, but I don't
know of SGE support for it.  People typically process the accounting
file with a script if qacct output doesn't satisfy their requirements.
(Possibly qacct should be extended.)

> ** I'm asking so cause I've seen that a user has killed many jobs. I did
> not know it and I was wondering why the cluster got empty in 10 min, so
> I started to look for those jobs in logs and found nothing... as I'd
> like to know what happens with every submitted job, I'd like to know
> how to make sge write all job status information.

qacct shows killed jobs as

  failed       100 : assumedly after job
  exit_status  137

and if you set the log level to "info", the qmaster messages record
qdels.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to