That's been our experience too, with the second highest cause a segfault in
the user's code.

You can figure out for sure by looking at the exec daemon's messages file.

On Mon, May 18, 2015 at 02:52:15PM +0200, Nicols Serrano Martnez-Santos wrote:
> It can be caused by multiple issues. The most common cause in my department is
> that HDD of the execution host is full, so Grid Engine put the host in error 
> to 
> prevent more errors.
> 
> NiCo
> 
> Excerpts from sudha.penmetsa's message of 2015-05-18 14:45:48 +0200:
> > Hi Gavin,
> > 
> > I clear the error state using qmod -c "*".
> > 
> > Wanted to know the root cause and the solution to fix the issue permanently.
> > 
> > Regards,
> > Sudha
> > 
> > -----Original Message-----
> > From: Gavin W. Burris [mailto:[email protected]]
> > Sent: Monday, May 18, 2015 6:08 PM
> > To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
> > Cc: [email protected]
> > Subject: Re: [gridengine users] Grid queue goes into an error state due to 
> > one job
> > 
> > Hello, Sudha.
> > 
> > Give this a try:  qmod -c "*"
> > 
> > Cheers.
> > 
> > 
> > On 10:51AM Mon 05/18/15 +0000, [email protected] wrote:
> > > Hi,
> > >
> > > We have few hosts added to a queue. Due to one single job submitted to 
> > > the queue the whole queue goes into error state.
> > >
> > > As a result, no new jobs can  be submitted to the queue unless we clear 
> > > the error state.
> > >
> > > Can anyone please let me know what could be the reason for this and how 
> > > to fix it permanently.
> > >
> > > Ex
> > >
> > > test.q@host1              BIP   7/40      10.86    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host1
> > > ---------------------------------------------------------------------------
> > > test.q@host2              BIP   7/40      10.74    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host2
> > > ----------------------------------------------------------------------------
> > > test.q@host3              BIP   10/40     10.73    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host3
> > > ----------------------------------------------------------------------------
> > > test.q@host4              BIP   8/40      11.28    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host4
> > > ----------------------------------------------------------------------------
> > > test.q@host5             BIP   7/40      11.52    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host5
> > > ----------------------------------------------------------------------------
> > > test.q@host6              BIP   8/40      10.41    lx24-amd64    E
> > >         queue test.q marked QERROR as result of job 8169748's failure
> > > at host host6
> > >
> > > Regards,
> > > Sudha
> > > The information contained in this electronic message and any
> > > attachments to this message are intended for the exclusive use of the
> > > addressee(s) and may contain proprietary, confidential or privileged
> > > information. If you are not the intended recipient, you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and destroy all copies of this message and any
> > > attachments. WARNING: Computer viruses can be transmitted via email.
> > > The recipient should check this email and any attachments for the
> > > presence of viruses. The company accepts no liability for any damage
> > > caused by any virus transmitted by this email. www.wipro.com
> > 
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> > 
> > 
> > --
> > Gavin W. Burris
> > Senior Project Leader for Research Computing The Wharton School University 
> > of Pennsylvania
> > The information contained in this electronic message and any attachments to 
> > this message are intended for the exclusive use of the addressee(s) and may 
> > contain proprietary, confidential or privileged information. If you are not 
> > the intended recipient, you should not disseminate, distribute or copy this 
> > e-mail. Please notify the sender immediately and destroy all copies of this 
> > message and any attachments. WARNING: Computer viruses can be transmitted 
> > via email. The recipient should check this email and any attachments for 
> > the presence of viruses. The company accepts no liability for any damage 
> > caused by any virus transmitted by this email. www.wipro.com
> > 
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

-- 
-- Skylar Thompson ([email protected])
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to