> Am 02.11.2015 um 23:21 schrieb Wagner, Justin <jwag...@ciena.com>:
> 
> The name of the ENV variable is blank, i.e. null, i.e. not defined.  Which I 
> know seems a bit strange, so I'll give more detail.
> 
> The culprit of generating this blank ENV variable keyname is the 
> Shell::EnvImporter perl module. Which can easily be worked around.
> 
> Basically when the values of keys in the environment are multiline, it will 
> generate additional environment keys for each of the additional lines, each 
> with a key of whatever is on that line, even if the line is blank (this is 
> the root problem).  Then an empty key in the environment was causing qsub to 
> crash.
> 
> For example:
> # Delete all environment variables
> map(delete($ENV{$_}), keys %ENV);
> $ENV{TEST_VAR} = "Line1\nLine2\n\n";
> my $runner = Shell::EnvImporter->new(
>                     shell       => "some_shell",
>                     file        => "/some/nonrelevant/.sourcedfile",
>                     auto_run    => 1,
>                     auto_import => 1,
>                     ) or die $@;
> foreach my $key (keys(%ENV)) {
>      print "[$key]:[$ENV{$key}]\n"; 
>   }

So the problem is not generated in SGE but in an external script. I wonder 
whether an environment variable with no name is valid (although `env =42` also 
shows such a line). Could such a variable be addressed at all? `echo ${}` would 
be the syntax but it fails (of course).


> Would result in the following print:
> 
> []:[]
> [Line2]:[]
> [TEST_VAR]:[Line1]
> 
> That first line of "[]:[]" is the evidence of the "Blank" environment 
> variable key name that I speak of.
> 
> When the expected result is:
> [TEST_VAR]:[Line1\nLine2\n\n]
> 
> When we submit jobs to the grid, we use the "-V" option which exports the 
> current environment variables to the context of the job, and I believe when 
> it's trying to export whatever this blank environment variable key is, it's 
> core dumping.

I always tell my users to avoid using -V. Sometimes they made some changes in 
the actual shell to the environment, submit a job and when the job starts after 
3 days and crashes it's hard to investigate, as the settings can't be 
reproduced. It's better to define all variables either on the command line or 
(better) inside the job script, so that another run will have the same 
conditions again for sure.

-- Reuti


> Justin
> 
> -----Original Message-----
> From: Jesse Becker [mailto:becke...@mail.nih.gov] 
> Sent: Thursday, October 29, 2015 12:33 PM
> To: Wagner, Justin
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Possible Causes of: critical error: 
> unrecoverable error - contact systems manager
> 
> What was the name of the ENV variable?  How was it being used by qsub and/or 
> the job script?
> 
> On Thu, Oct 29, 2015 at 03:02:22PM -0400, Wagner, Justin wrote:
>> For anybody who is interested I found the root cause of this crash of qsub.
>> 
>> The root cause is that we had an environment variable whose key was blank 
>> that was an artifact of another bug, and this environment variable key 
>> causes qsub to crash every single time.
>> 
>> Hopefully somebody is familiar enough with the qsub code to look at why that 
>> might cause a crash.  If not, I can cook up a simple script to show the 
>> problem.
>> 
>> Justin
>> 
>> From: users-boun...@gridengine.org 
>> [mailto:users-boun...@gridengine.org] On Behalf Of Wagner, Justin
>> Sent: Tuesday, September 22, 2015 10:02 AM
>> To: users@gridengine.org
>> Subject: [gridengine users] Possible Causes of: critical error: 
>> unrecoverable error - contact systems manager
>> 
>> I am running SoGE 8.1.0 and recently I had a problem when submitting a job 
>> to the grid via qsub, and qsub returned the error "critical error: 
>> unrecoverable error - contact systems manager"
>> 
>> I am trying to narrow down the root cause of this issue.  I am able to send 
>> the same exact command, from the same exact user, on the same exact submit 
>> host, and get the command to work.   However, I am using a script that is 
>> getting executed by Jenkins to launch the job, and I am also able to 
>> reliably reproduce the error when I use the "rebuild" plugin to rebuild the 
>> same build.  I am suspecting that some environment variable is different 
>> between these two cases, and is causing this critical error, however I 
>> haven't been able to identify any differences there as of yet.
>> 
>> Can somebody point me to the source that is throwing this error, or possibly 
>> give me a list of what the possible causes are for this error?
>> 
>> Thanks,
>> 
>> Justin
>> 
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> --
> Jesse Becker (Contractor)
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to