Re: [gridengine users] Node refuse to run job

2012-02-16 Thread Prentice Bisbal
I recently created a website for myself, and threw this documentation up there, where it might be a little easier to read: http://prentice.bisbal.co/hpc/sge/cannot_run_on_host -- Prentice On 02/10/2012 09:52 AM, Prentice Bisbal wrote: Jerome, I had a similar problem a couple of years ago.

Re: [gridengine users] Node refuse to run job

2012-02-16 Thread Chris Sandridge
We recently experience this error on GE 6.2u5. Chris On 02/16/2012 09:08 AM, Prentice Bisbal wrote: On 02/15/2012 07:03 PM, Dave Love wrote: Prentice Bisbalprent...@ias.edu writes: Jerome, I had a similar problem a couple of years ago. I would get this error: cannot run on host

Re: [gridengine users] Node refuse to run job

2012-02-15 Thread Dave Love
Prentice Bisbal prent...@ias.edu writes: Jerome, I had a similar problem a couple of years ago. I would get this error: cannot run on host node64.aurora until clean up of an previous run has finished Is this known to happen with a recent version, or just old ones?

Re: [gridengine users] Node refuse to run job

2012-02-10 Thread Prentice Bisbal
Jerome, I had a similar problem a couple of years ago. I would get this error: cannot run on host node64.aurora until clean up of an previous run has finished (Aurora's my cluster's name, so I use that as my top-level domain on my cluster nodes) Fixing this problem is a bit tedious.

Re: [gridengine users] Node refuse to run job

2012-02-10 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.
hi the old thread DT did suggest to take down the entire cluster(SGE) so to cleanup the EH_reschedule_unknown_list regards On 2/10/2012 9:52 AM, Prentice Bisbal wrote: Jerome, I had a similar problem a couple of years ago. I would get this error: cannot run on host node64.aurora until

[gridengine users] Node refuse to run job

2012-02-09 Thread Jerome
Dera all I have the SGE version GE 6.2u2_1 on a Rocks cluster. Since few days, a node refuse to run a job. using qstat -j jid, i notice this line a the end of the output: cannot run on host compute-2-15.local until clean up of an previous run has finished I revise on the node 2-15, but the

Re: [gridengine users] Node refuse to run job

2012-02-09 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.
check the CELL/spool/ directory of the qmaster and nodes On 2/9/2012 12:51 PM, Jerome wrote: Dera all I have the SGE version GE 6.2u2_1 on a Rocks cluster. Since few days, a node refuse to run a job. using qstat -j jid, i notice this line a the end of the output: cannot run on host

Re: [gridengine users] Node refuse to run job

2012-02-09 Thread Jerome
Dear Hung-Sheng Thanks for your quick reply. I've check on the CELL/spool/ on the node, and the jobs directory is empty. On the master node, the jobs directory just contain the number of files corresponding to the jobs running o qiting to be run. Should i check in a specific directory?

Re: [gridengine users] Node refuse to run job

2012-02-09 Thread Hung-Sheng Tsao (laoTsao)
i am not sure, but if you look the source code you can see this error msg come from scheduler may be try to restart the qmaster when the system does not have jobs running or queuing sorry Sent from my iPad On Feb 9, 2012, at 19:07, Jerome jer...@ibt.unam.mx wrote: Dear Hung-Sheng Thanks

Re: [gridengine users] Node refuse to run job

2012-02-09 Thread Hung-Sheng Tsao (laoTsao)
one more things may be increase the debug level so one can get more info:-) Sent from my iPad On Feb 9, 2012, at 21:08, Hung-Sheng Tsao (laoTsao) laot...@gmail.com wrote: i am not sure, but if you look the source code you can see this error msg come from scheduler may be try to restart the