> Am 05.05.2015 um 11:52 schrieb Stefano Bridi <[email protected]>:
> 
> Ok, sorry, yesterday I miss to reply to the list.
> Today is not a busy day for that queue so I had to recreate the
> problem: by doing this I saw that while the queue is empty all works
> as expected (for the seconds between the submit and the start of the
> job it is displayed "qw" by the 'qstat -q E5m' as expected.
> The E5m queue is built with 5 nodes: n010[4-8]. At the moment only one
> is under real use so I need to submit 5 jobs to have one "qw".
> 
> $ qsub sleeper.sh
> Your job 876766 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876767 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876768 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876769 ("sleeper.sh") has been submitted
> $ qsub sleeper.sh
> Your job 876770 ("sleeper.sh") has been submitted
> $ qalter -w v 876770
> Job 876770 cannot run in queue "opteron" because it is not contained
> in its hard queue list (-q)
> Job 876770 cannot run in queue "x5355" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "e5645" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "x5560" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "x5670" because it is not contained in
> its hard queue list (-q)
> Job 876770 cannot run in queue "E5" because it is not contained in its
> hard queue list (-q)
> Job 876770 (-l exclusive=true) cannot run at host "n0104" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0105" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0106" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0107" because
> exclusive resource (exclusive) is already in use
> Job 876770 (-l exclusive=true) cannot run at host "n0108" because
> exclusive resource (exclusive) is already in use
> verification: no suitable queues
> $
> 
> Does this mean that the "exclusive" complex  requested via the "qsub
> -l excl=true" is evaluated on the node before the check on the hard
> queue list?

It's not only related to the "exclusive" use. There seem to be some side 
effects whether a complex is attached to an exechost and/or a queue.

I see jobs disappearing and reappearing depending on other running jobs when I 
use "-q".

Nevertheless, maybe you can attach the complex on a queue level too.

-- Reuti


> If I am correct, is there another way to have both 'qstat
> -q' and exclusive use of nodes working?
> thanks
> stefano
> 
> Il 04/mag/2015 13:46, "Reuti" <[email protected]> ha scritto:
> Hi,
> 
> > Am 04.05.2015 um 13:25 schrieb Stefano Bridi <[email protected]>:
> >
> > Hi all,
> > I need to give the possibility to the user to reserve one or more node
> > for exclusive use for their runs.
> > It is a mixed environment and If they don't reserve the node for
> > exclusive use, the serial and low  number of core jobs will fragment
> > the availability of cores across many nodes.
> > The problem is that now the "exclusive" jobs are not listed anymore in
> > the "per queue" qstat:
> >
> > We solved the exclusive request  by setting up a new complex:
> >
> > # qconf -sc excl
> > #name               shortcut           type        relop   requestable
> > consumable default  urgency
> > #--------------------------------------------------------------------------------------------------
> > exclusive           excl               BOOL        EXCL    YES
> > YES        0        1000
> >
> > and setting on every node usable in this way the relative complex (is
> > there a way to set this system wide?):
> >
> > #qconf -se n0108
> > hostname              n0108
> > load_scaling          NONE
> > complex_values        exclusive=true
> > load_values           arch=linux-x64,num_proc=20,....[snip]
> > processors            20
> > user_lists            NONE
> > xuser_lists           NONE
> > projects              NONE
> > xprojects             NONE
> > usage_scaling         NONE
> > report_variables      NONE
> >
> > now it I submit a job like:
> > $ cat sleeper.sh
> > #!/bin/bash
> >
> > #
> > #$ -cwd
> > #$ -j y
> > #$ -q E5m
> > #$ -S /bin/bash
> > #$ -l excl=true
> > #
> > date
> > sleep 20
> > date
> >
> > $
> > All works as expected except qstat:
> > a generic 'qstat' report:
> > job-ID  prior   name       user         state submit/start at
> > queue                          slots ja-task-ID
> > -----------------------------------------------------------------------------------------------------------------
> > 876735 0.50601 sleeper.sh s.bridi      qw    05/04/2015 12:20:45
> >                              1
> >
> > and the 'qstat -j 876735' report:
> > ==============================================================
> > job_number:                 876735
> > exec_file:                  job_scripts/876735
> > submission_time:            Mon May  4 12:20:45 2015
> > owner:                      s.bridi
> > uid:                        65535
> > group:                      domusers
> > gid:                        15000
> > sge_o_home:                 /home/s.bridi
> > sge_o_log_name:             s.bridi
> > sge_o_path:
> > /sw/openmpi/142/bin:.:/ge/bin/linux-x64:/usr/lib64/qt-3.3/bin:/ge/bin/linux-x64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/s.bridi/bin
> > sge_o_shell:                /bin/bash
> > sge_o_workdir:              /home/s.bridi/testexcl
> > sge_o_host:                 login0
> > account:                    sge
> > cwd:                        /home/s.bridi/testexcl
> > merge:                      y
> > hard resource_list:         exclusive=true
> > mail_list:                  s.bridi@login0
> > notify:                     FALSE
> > job_name:                   sleeper.sh
> > jobshare:                   0
> > hard_queue_list:            E5m
> > shell_list:                 NONE:/bin/bash
> > env_list:
> > script_file:                sleeper.sh
> > scheduling info:            [snip]
> >
> > while the
> > 'qstat -q E5m' don't list the job!
> 
> Usually this means that the job is not allowed to run in this queue.
> 
> What does:
> 
> $ qalter -w v 876735
> 
> ouput?
> 
> -- Reuti
> 
> 
> > Thanks
> > Stefano
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to