The name of the ENV variable is blank, i.e. null, i.e. not defined. Which I
know seems a bit strange, so I'll give more detail.
The culprit of generating this blank ENV variable keyname is the
Shell::EnvImporter perl module. Which can easily be worked around.
Basically when the values of keys in the environment are multiline, it will
generate additional environment keys for each of the additional lines, each
with a key of whatever is on that line, even if the line is blank (this is the
root problem). Then an empty key in the environment was causing qsub to crash.
For example:
# Delete all environment variables
map(delete($ENV{$_}), keys %ENV);
$ENV{TEST_VAR} = "Line1\nLine2\n\n";
my $runner = Shell::EnvImporter->new(
shell => "some_shell",
file => "/some/nonrelevant/.sourcedfile",
auto_run => 1,
auto_import => 1,
) or die $@;
foreach my $key (keys(%ENV)) {
print "[$key]:[$ENV{$key}]\n";
}
Would result in the following print:
[]:[]
[Line2]:[]
[TEST_VAR]:[Line1]
That first line of "[]:[]" is the evidence of the "Blank" environment variable
key name that I speak of.
When the expected result is:
[TEST_VAR]:[Line1\nLine2\n\n]
When we submit jobs to the grid, we use the "-V" option which exports the
current environment variables to the context of the job, and I believe when
it's trying to export whatever this blank environment variable key is, it's
core dumping.
Justin
-----Original Message-----
From: Jesse Becker [mailto:[email protected]]
Sent: Thursday, October 29, 2015 12:33 PM
To: Wagner, Justin
Cc: [email protected]
Subject: Re: [gridengine users] Possible Causes of: critical error:
unrecoverable error - contact systems manager
What was the name of the ENV variable? How was it being used by qsub and/or
the job script?
On Thu, Oct 29, 2015 at 03:02:22PM -0400, Wagner, Justin wrote:
>For anybody who is interested I found the root cause of this crash of qsub.
>
>The root cause is that we had an environment variable whose key was blank that
>was an artifact of another bug, and this environment variable key causes qsub
>to crash every single time.
>
>Hopefully somebody is familiar enough with the qsub code to look at why that
>might cause a crash. If not, I can cook up a simple script to show the
>problem.
>
>Justin
>
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Wagner, Justin
>Sent: Tuesday, September 22, 2015 10:02 AM
>To: [email protected]
>Subject: [gridengine users] Possible Causes of: critical error:
>unrecoverable error - contact systems manager
>
>I am running SoGE 8.1.0 and recently I had a problem when submitting a job to
>the grid via qsub, and qsub returned the error "critical error: unrecoverable
>error - contact systems manager"
>
>I am trying to narrow down the root cause of this issue. I am able to send
>the same exact command, from the same exact user, on the same exact submit
>host, and get the command to work. However, I am using a script that is
>getting executed by Jenkins to launch the job, and I am also able to reliably
>reproduce the error when I use the "rebuild" plugin to rebuild the same build.
> I am suspecting that some environment variable is different between these two
>cases, and is causing this critical error, however I haven't been able to
>identify any differences there as of yet.
>
>Can somebody point me to the source that is throwing this error, or possibly
>give me a list of what the possible causes are for this error?
>
>Thanks,
>
>Justin
>
>_______________________________________________
>users mailing list
>[email protected]
>https://gridengine.org/mailman/listinfo/users
--
Jesse Becker (Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users