Re: [gridengine users] "cannot run until clean up of an previous run has finished"

2016-01-25 Thread Marlies Hankel
Hi all, Thank you. So I really have to remove the node from SGE and then put it back in? Or is there an easier way? I checked the spool directory and there is nothing in there. Also, funny enough the node does accept jobs if they are single node jobs. For example, there are two nodes free, n

Re: [gridengine users] RoundRobin scheduling among users

2016-01-25 Thread Skylar Thompson
On Mon, Jan 25, 2016 at 10:17:16PM +0100, Reuti wrote: > > Am 25.01.2016 um 20:34 schrieb Skylar Thompson: > > > Yep, we use functional tickets to accomplish this exact goal. Every user > > gets 1000 functional tickets via auto_user_fshare in sge_conf(5), though > > your exact number will depend

Re: [gridengine users] RoundRobin scheduling among users

2016-01-25 Thread Reuti
Am 25.01.2016 um 20:34 schrieb Skylar Thompson: > Yep, we use functional tickets to accomplish this exact goal. Every user > gets 1000 functional tickets via auto_user_fshare in sge_conf(5), though > your exact number will depend on the number tickets and weights you have > elsewhere in your poli

Re: [gridengine users] nodes group

2016-01-25 Thread Reuti
On the one hand you could specify a list of nodes and using a wildcard for the queue: $ qsub -q "*@node01,*@node02" ... First simplification could be to define a host group for these particular nodes (it's *not* necessary to use this hostgroup in the queue definition too). $ qsub -q "*@@i5" ..

Re: [gridengine users] RoundRobin scheduling among users

2016-01-25 Thread Skylar Thompson
Yep, we use functional tickets to accomplish this exact goal. Every user gets 1000 functional tickets via auto_user_fshare in sge_conf(5), though your exact number will depend on the number tickets and weights you have elsewhere in your policy configuration. On Mon, Jan 25, 2016 at 11:25:53AM -080

[gridengine users] RoundRobin scheduling among users

2016-01-25 Thread Christopher Heiny
Hi all, We've been using GridEngine for several years now, currently OGS 2011.11p1 on Fedora 20 installed from Fedora RPMs. Our job mix is mostly embarassingly parallel - we use array jobs to dispatch up to 100 tasks, each of which might require 1, 16, 32, or 64 cores. Each job takes up a signi

Re: [gridengine users] nodes group

2016-01-25 Thread Reuti
Hi, Am 25.01.2016 um 19:49 schrieb Dimar Jaime González Soto: > Hi everyone, I need to execute a program in a few nodes(in my case slave > nodes), how can I specify that trough the command line? Do you refer by "few nodes" to a parallel job, or that a serial job may only run on some nodes due

[gridengine users] nodes group

2016-01-25 Thread Dimar Jaime González Soto
Hi everyone, I need to execute a program in a few nodes(in my case slave nodes), how can I specify that trough the command line? -- Atte. Dimar González Soto Ingeniero Civil en Informática Universidad Austral de Chile ___ users mailing list users@gride

Re: [gridengine users] "cannot run until clean up of an previous run has finished"

2016-01-25 Thread Reuti
Hi, > Am 25.01.2016 um 00:46 schrieb Marlies Hankel : > > Hi all, > > Over the weekend something seems to have gone wrong with one of the nodes in > our cluster. We get the error: > > cannot run on host "cpu-1-3.local" until clean up of an previous run has > finished > > > I have restarted

Re: [gridengine users] Weird prolog behavior

2016-01-25 Thread Taras Shapovalov
Thank you guys for the advise! Best regards, Taras On Mon, Jan 25, 2016 at 5:54 PM, William Hay wrote: > On Mon, Jan 25, 2016 at 01:05:50PM +0100, Fritz Ferstl wrote: > > One could also wrap the shepherd if what you wanted to do is check for > the > > working directory and potentially create

Re: [gridengine users] Weird prolog behavior

2016-01-25 Thread William Hay
On Mon, Jan 25, 2016 at 01:05:50PM +0100, Fritz Ferstl wrote: > One could also wrap the shepherd if what you wanted to do is check for the > working directory and potentially create or mount it if it isn't there yet > before exec'ing the real shepherd. All so called methods (prolog, starter, > pe-*

[gridengine users] Same job consumes memory in a different way

2016-01-25 Thread sudha.penmetsa
Hi, We have launched same job at different times, the h_vmem is defined as 12GB. One job consumed only 10.5G and is successful while the other consumed 18.3G and thus was killed. 01/13/2016 00:13:18|execd|test1|W|job 33452 exceeds job hard limit "h_vmem" of queue "test.q@test1

Re: [gridengine users] Weird prolog behavior

2016-01-25 Thread Fritz Ferstl
One could also wrap the shepherd if what you wanted to do is check for the working directory and potentially create or mount it if it isn't there yet before exec'ing the real shepherd. All so called methods (prolog, starter, pe-*, etc) are run by the shepherd. So by wrapping it you can precede

Re: [gridengine users] Weird prolog behavior

2016-01-25 Thread William Hay
On Mon, Jan 25, 2016 at 01:33:09PM +0300, Taras Shapovalov wrote: >Hi guys, >We have faced with uncharacteristic (for other workload mangers) behavior >of OGS 2011.11p1 (probably UGE has the same behavior, not sure yet). >Prolog is called always after stderr/out files are created. T

[gridengine users] Weird prolog behavior

2016-01-25 Thread Taras Shapovalov
Hi guys, We have faced with uncharacteristic (for other workload mangers) behavior of OGS 2011.11p1 (probably UGE has the same behavior, not sure yet). Prolog is called always after stderr/out files are created. This means that if prolog creates some directories that are not exist before and std