Reuti wrote:
Ursula Winkler wrote:
Well, obviously the SGE doesn't find ANY path - the epilog routine is also not found. Has anybody seen such an behaviour before?
Can you submit a simple job with:
df -h
The job script is executed with the same environment like the prolog,...
Reuti wrote:
Am 03.04.2012 um 16:33 schrieb Ursula Winkler:
Well, obviously the SGE doesn't find ANY path - the epilog routine is also not
found. Has anybody seen such an behaviour before?
Can you submit a simple job with:
df -h
The job script is executed with the same environment
Am 04.04.2012 um 09:12 schrieb Ursula Winkler:
Reuti wrote:
Ursula Winkler wrote:
Well, obviously the SGE doesn't find ANY path - the epilog routine is
also not found. Has anybody seen such an behaviour before?
Can you submit a simple job with:
df -h
The job script is
Reuti wrote:
This is also some kind of personal taste. Some prefer classic spooling as you
can check all the information of a job as they are just stored as text files.
And it even handles a large number of nodes before it gets performance
problems. Maybe Chris can make a statement about it
On Mar 28, 2012, at 17:31 , Reuti wrote:
Hi,
Am 27.03.2012 um 15:42 schrieb Esztermann, Ansgar:
Hi everyone,
while in general, all users are equal in our installation, I would like some
nodes to have a longer maximum runtime for some users. In order to avoid
oversubscription, we
Am 04.04.2012 um 14:28 schrieb Esztermann, Ansgar:
On Mar 28, 2012, at 17:31 , Reuti wrote:
Hi,
Am 27.03.2012 um 15:42 schrieb Esztermann, Ansgar:
Hi everyone,
while in general, all users are equal in our installation, I would like
some nodes to have a longer maximum runtime for
---BeginMessage---
Reuti wrote:
Yes, it expects exactly one argument: $pe_hostfile (besides any number of
options prefixed by a dash).
So the complete string specified for start_proc_args is limited this number
of characters.
To be honest: I have no clue for the cause of this issue, it
Reuti wrote:
Yes, it expects exactly one argument: $pe_hostfile (besides any number of
options prefixed by a dash).
So the complete string specified for start_proc_args is limited this number
of characters.
To be honest: I have no clue for the cause of this issue, it never happened
to
Well, in both cases it is killed of course. You could set loglevel to log_info
and search the messages file of the qmaster for entries like:
04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370
rescheduling because: manual/auto rescheduling
04/04/2012
Hey Reuti
On 4 April 2012 17:14, Reuti re...@staff.uni-marburg.de wrote:
Well, in both cases it is killed of course. You could set loglevel to
log_info and search the messages file of the qmaster for entries like:
04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370
Am 04.04.2012 um 17:42 schrieb Lars van der bijl:
Hey Reuti
On 4 April 2012 17:14, Reuti re...@staff.uni-marburg.de wrote:
Well, in both cases it is killed of course. You could set loglevel to
log_info and search the messages file of the qmaster for entries like:
04/04/2012
in our case the application has no checkpointing capabilities. for us
a reschedule is just run from start on a new host.
so a checkpoint with a signal 9 should be enough?
On 4 April 2012 17:50, Reuti re...@staff.uni-marburg.de wrote:
Am 04.04.2012 um 17:42 schrieb Lars van der bijl:
Hey
Am 04.04.2012 um 18:09 schrieb Lars van der bijl:
in our case the application has no checkpointing capabilities. for us
a reschedule is just run from start on a new host.
so a checkpoint with a signal 9 should be enough?
No, the signal will be send to create a checkpoint in min_cpu_interval
On Tue, Apr 03, 2012 at 03:19:51PM -0700, Joshua Baker-LePain wrote:
..
Yes. We have the SGE commlib errors, and the Open MPI
routed:binomial errors. I'm mainly focusing on the SGE problem
right now, as I think (hope) that fixing that will also fix the MPI
issue.
could it be related to
On Wed, 4 Apr 2012 at 6:33pm, Tru Huynh wrote
On Tue, Apr 03, 2012 at 03:19:51PM -0700, Joshua Baker-LePain wrote:
Yes. We have the SGE commlib errors, and the Open MPI
routed:binomial errors. I'm mainly focusing on the SGE problem
right now, as I think (hope) that fixing that will also fix
I did not know that you can have shadow master and not using classic spool??
regards
On 4/4/2012 4:37 PM, Joshua Baker-LePain wrote:
That being said, our SGE directory isn't NFS shared. We use local
spool directories and local SGE installations on all the nodes. The
only thing that's NFS
Am 04.04.2012 um 23:15 schrieb Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.:
I did not know that you can have shadow master and not using classic spool??
regards
http://gridengine.org/pipermail/users/2011-March/000508.html
-- Reuti
On 4/4/2012 4:37 PM, Joshua Baker-LePain wrote:
That being said,
thx
On 4/4/2012 5:23 PM, Reuti wrote:
Am 04.04.2012 um 23:15 schrieb Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.:
I did not know that you can have shadow master and not using classic spool??
regards
http://gridengine.org/pipermail/users/2011-March/000508.html
-- Reuti
On 4/4/2012 4:37 PM, Joshua
Note that dependency on NFSv4 was removed in Grid Engine 2011.11:
http://gridscheduler.sourceforge.net/Releases/ReleaseNotesGE2011.11.pdf
You can use any versions of NFS to back the spool directory.
Rayson
On Wed, Apr 4, 2012 at 5:23 PM, Reuti re...@staff.uni-marburg.de wrote:
Am 04.04.2012
19 matches
Mail list logo