[In case this is still relevant.]

Reuti <[email protected]> writes:

>> I've tried screen a bit before, thanks. Someone else had idea which
>> might work even if the admin doesn't increase wallclock time. To
>> qlogin, *then* start screen and start the debugging process, then
>> detatch and logout. Then qlogin into the *same node* and
>> reattach. I'm going to experiment with that, see if it works.
>
> Well, this would violate the granted scheduling, and AFAICS the screen
> session will be terminated in a proper way due to the attached
> additonal group ID.
>
> NB: the ownership of the generated /dev/pts/x is wrong and needs to be
> fixed to have access to it as a user (in case you want to test it on
> your own).

That's fixed in the SGE development version.

Isn't there a general solution to debugging something that crashes after
a long time?  Why not checkpoint at an appropriate interval and then
restart under the debugger?  A single-node job is likely to work OK
under DMTCP, which is easy to use.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to