For anybody who is interested I found the root cause of this crash of qsub.

The root cause is that we had an environment variable whose key was blank that 
was an artifact of another bug, and this environment variable key causes qsub 
to crash every single time.

Hopefully somebody is familiar enough with the qsub code to look at why that 
might cause a crash.  If not, I can cook up a simple script to show the problem.

Justin

From: [email protected] [mailto:[email protected]] On 
Behalf Of Wagner, Justin
Sent: Tuesday, September 22, 2015 10:02 AM
To: [email protected]
Subject: [gridengine users] Possible Causes of: critical error: unrecoverable 
error - contact systems manager

I am running SoGE 8.1.0 and recently I had a problem when submitting a job to 
the grid via qsub, and qsub returned the error "critical error: unrecoverable 
error - contact systems manager"

I am trying to narrow down the root cause of this issue.  I am able to send the 
same exact command, from the same exact user, on the same exact submit host, 
and get the command to work.   However, I am using a script that is getting 
executed by Jenkins to launch the job, and I am also able to reliably reproduce 
the error when I use the "rebuild" plugin to rebuild the same build.  I am 
suspecting that some environment variable is different between these two cases, 
and is causing this critical error, however I haven't been able to identify any 
differences there as of yet.

Can somebody point me to the source that is throwing this error, or possibly 
give me a list of what the possible causes are for this error?

Thanks,

Justin

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to