On Fri, 4 Apr 2014 at 8:45am, Mark Dixon wrote
I think we've been bitten by something that others have seen and brought up
on this list over the years, where the amount of usage reported in the share
tree can become unexpectedly large when using task array jobs.
I am trying to reproduce this on a test install, to take a closer look at
what is going on.
Can anyone offer me any advice on how to trigger the problem, please? I'm
tinkering with a relatively recent soge, but I think the issue dates from the
good old days.
Yeah, that's an oldie but a goodie. I haven't tested it in a while, since
we moved from share-tree to functional shares for exactly this reason.
The original bug is here <https://arc.liv.ac.uk/trac/SGE/ticket/435>.
IIRC, I could reproduce this on a fresh test cluster with several slots by
doing something like the following:
o Create 2 users with equal shares.
o Have the 2 users submit equal numbers of tasks -- one as an array job,
one as individual jobs. I don't know that I ever tested if it mattered
if the jobs were actually CPU intensive vs. just 'sleep' jobs.
o Theoretically, the users should always have approximately the same
number of tasks running. In practice, after a while the array job user
would have fewer tasks running than the individual job user.
If you have any issues recreating this, let me know and I'll see if I can
still do so.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users