Re: [gridengine users] resource reservation problem

Reuti Mon, 13 May 2013 23:53:02 -0700

Am 14.05.2013 um 02:33 schrieb Chris Paciorek:

> I tried submitting a job with h_rt requested for 30 minutes. 
> qsub -pe smp 16 -R y -l h_rt=30 -b y "R CMD BATCH --no-save tmp.R tmp.out"


This would be 30 seconds. 30 minutes can be specified as ":30:".


> Our default_duration is still set at 7200 hours.

So all other jobs still haven't any h_rt/s_rt set? I meant to request a proper 
time for all jobs, especially for these which are running shorter than 300 
days. Even if a queue has a time limit, it won't be taken into account when the 
reservation is made. So the reserved node might be having a job running which 
will end first:

>  33004 0.06039 tophat.sh  seqc         r     04/24/2013 07:14:20 
> [email protected]       32

will end before the jobs in scf-sm01 and scf-sm03 when all would run for 300 
days.

-- Reuti



> The submitted job is at the top of the queue (see below) but jobs requesting 
> fewer cores are slipping ahead of the job with the reservation. I believe 
> this is happening because the reservation was placed on node scf-sm02. Here 
> are the relevant lines from the schedule file:
> 34640:1:RESERVING:1369228520:90:P:smp:slots:16.000000
> 34640:1:RESERVING:1369228520:90:Q:[email protected]:slots:16.000000
> 
> So it seems that what is happening is that SGE has decided to put the 
> reservation on node scf-sm02, which has the longest running current job (# 
> 33004), perhaps because based on the expected_duration of 7200 hours it 
> expects that job to finish first amongst all running jobs. Then when jobs on 
> other nodes finish, the reservation is not applied to those other nodes and 
> so jobs slip ahead of the job that has requested the reservation.  Here's a 
> snapshot of the queue after job #34640 was submitted with a reservation 
> attached to it. Shortly after this snapshot, job # 34333 started on node 
> scf-sm03, despite the reservation for job # 34640.
> 
> Any thoughts on whether this understanding is correct?
> 
> 
> job-ID  prior   name       user         state submit/start at     queue       
>                    slots ja-task-ID 
> -----------------------------------------------------------------------------------------------------------------
>   33004 0.06039 tophat.sh  seqc         r     04/24/2013 07:14:20 
> [email protected]       32        
>   34321 0.00211 SubSampleF isoform      r     05/13/2013 08:52:27 
> [email protected]        8        
>   34322 0.00211 SubSampleF isoform      r     05/13/2013 09:05:42 
> [email protected]        8        
>   34323 0.00211 SubSampleF isoform      r     05/13/2013 09:28:42 
> [email protected]        8        
>   34324 0.00211 SubSampleF isoform      r     05/13/2013 09:41:42 
> [email protected]        8        
>   34325 0.00211 SubSampleF isoform      r     05/13/2013 09:57:12 
> [email protected]        8        
>   34326 0.00211 SubSampleF isoform      r     05/13/2013 10:15:12 
> [email protected]        8        
>   34327 0.00211 SubSampleF isoform      r     05/13/2013 10:56:27 
> [email protected]        8        
>   34328 0.00211 SubSampleF isoform      r     05/13/2013 11:00:12 
> [email protected]        8        
>   34329 0.00211 SubSampleF isoform      r     05/13/2013 11:01:57 
> [email protected]        8        
>   34330 0.00211 SubSampleF isoform      r     05/13/2013 12:09:27 
> [email protected]        8        
>   34331 0.00211 SubSampleF isoform      r     05/13/2013 12:35:57 
> [email protected]        8        
>   34332 0.00211 SubSampleF isoform      r     05/13/2013 13:18:27 
> [email protected]        8        
>   34397 0.68717 tnBoot.sh  haiyanh      r     05/09/2013 17:45:02 
> [email protected]       8        
>   34613 0.03245 run_japan. lwtai        r     05/11/2013 23:52:39 
> [email protected]       1        
>   34614 0.03245 run_japan. lwtai        r     05/11/2013 23:52:39 
> [email protected]       1        
>   34615 0.03245 run_japan. lwtai        r     05/11/2013 23:52:39 
> [email protected]       1        
>   34616 0.03245 run_japan. lwtai        r     05/11/2013 23:52:39 
> [email protected]       1        
>   34633 0.03245 run2       lwtai        r     05/13/2013 09:36:27 
> [email protected]       1        
>   34648 0.03245 run_data   lwtai        r     05/13/2013 16:24:25 
> [email protected]       5        
>   34649 0.03245 run3       lwtai        r     05/13/2013 16:48:10 
> [email protected]       2        
>   34640 1.00000 R CMD BATC paciorek     qw    05/13/2013 13:45:13             
>                       16        
>   34333 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34334 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34335 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34336 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34337 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34338 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34339 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34340 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34341 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34342 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:56             
>                        8        
>   34343 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34344 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34345 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34346 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34347 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34348 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
>   34349 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:57             
>                        8        
> 
> 
> 
> On Fri, May 10, 2013 at 6:10 AM, Reuti <[email protected]> wrote:
> Hi,
> 
> Am 10.05.2013 um 00:35 schrieb Chris Paciorek:
> 
> > For the (default) queue [called low.q] that these jobs are going to, we 
> > have the time limit set to 28 days (see below). Users are not explicitly 
> > requesting h_rt/s_rt. The jobs that are slipping ahead of the reserved job 
> > are not actually jobs that are short in time, and SGE shouldn't have any 
> > way of thinking that they are.
> 
> https://arc.liv.ac.uk/trac/SGE/ticket/388
> 
> http://gridengine.org/pipermail/users/2012-July/004104.html
> 
> Without an explicit request the default runtime will be assumed for all jobs.
> 
> The jobs 34195-34198 weren't started at once, but one after the other. I 
> would say the jobs running before them on node scf-sm01 resp. scf-sm03 were 
> shorther than the extimated 7200 hrs. Can you please give it a try to submit 
> shorter job with an explicitly requested h_rt and check whether it changes 
> anything.
> 
> -- Reuti
> 
> 
> > I'm starting to suspect that the issue may be that the reservation seems to 
> > be hard-wired to individual nodes, and in our case it is being hard-wired 
> > to a node with the longest-running job, while other jobs on other nodes are 
> > finishing more quickly. I suppose this makes sense - in order to collect 
> > sufficient cores for a reservation, it needs to do so on a single node, so 
> > at some point, it needs to  decide which node that will be. Unfortunately 
> > in this case, it's immediately choosing the node with the long-running job 
> > as soon as the reservation is requested, but that long-running job is 
> > likely to continue to run for a while. Can anyone weigh in on whether this 
> > sounds right and if so, any ideas to deal with this?
> >
> > beren:~$ qconf -sq low.q
> > qname                 low.q
> > hostlist              @sm0
> > seq_no                0
> > load_thresholds       np_load_avg=1.75
> > suspend_thresholds    NONE
> > nsuspend              1
> > suspend_interval      00:05:00
> > priority              19
> > min_cpu_interval      00:05:00
> > processors            UNDEFINED
> > qtype                 BATCH
> > ckpt_list             NONE
> > pe_list               smp smpcontrol
> > rerun                 FALSE
> > slots                 32
> > tmpdir                /tmp
> > shell                 /bin/bash
> > prolog                NONE
> > epilog                NONE
> > shell_start_mode      posix_compliant
> > starter_method        NONE
> > suspend_method        NONE
> > resume_method         NONE
> > terminate_method      NONE
> > notify                00:00:60
> > owner_list            NONE
> > user_lists            sm0users
> > xuser_lists           NONE
> > subordinate_list      NONE
> > complex_values        NONE
> > projects              NONE
> > xprojects             NONE
> > calendar              NONE
> > initial_state         default
> > s_rt                  671:00:00
> > h_rt                  672:00:00
> > s_cpu                 INFINITY
> > h_cpu                 INFINITY
> > s_fsize               INFINITY
> > h_fsize               INFINITY
> > s_data                INFINITY
> > h_data                INFINITY
> > s_stack               INFINITY
> > h_stack               INFINITY
> > s_core                INFINITY
> > h_core                INFINITY
> > s_rss                 INFINITY
> > h_rss                 INFINITY
> > s_vmem                INFINITY
> > h_vmem                INFINITY
> >
> >
> >
> > beren:~$ qconf -sc
> > #name               shortcut   type        relop requestable consumable 
> > default  urgency
> > #----------------------------------------------------------------------------------------
> > arch                a          RESTRING    ==    YES         NO         
> > NONE     0
> > calendar            c          RESTRING    ==    YES         NO         
> > NONE     0
> > cpu                 cpu        DOUBLE      >=    YES         NO         0   
> >      0
> > display_win_gui     dwg        BOOL        ==    YES         NO         0   
> >      0
> > h_core              h_core     MEMORY      <=    YES         NO         0   
> >      0
> > h_cpu               h_cpu      TIME        <=    YES         NO         
> > 0:0:0    0
> > h_data              h_data     MEMORY      <=    YES         NO         0   
> >      0
> > h_fsize             h_fsize    MEMORY      <=    YES         NO         0   
> >      0
> > h_rss               h_rss      MEMORY      <=    YES         NO         0   
> >      0
> > h_rt                h_rt       TIME        <=    YES         NO         
> > 0:0:0    0
> > h_stack             h_stack    MEMORY      <=    YES         NO         0   
> >      0
> > h_vmem              h_vmem     MEMORY      <=    YES         NO         0   
> >      0
> > hostname            h          HOST        ==    YES         NO         
> > NONE     0
> > load_avg            la         DOUBLE      >=    NO          NO         0   
> >      0
> > load_long           ll         DOUBLE      >=    NO          NO         0   
> >      0
> > load_medium         lm         DOUBLE      >=    NO          NO         0   
> >      0
> > load_short          ls         DOUBLE      >=    NO          NO         0   
> >      0
> > m_core              core       INT         <=    YES         NO         0   
> >      0
> > m_socket            socket     INT         <=    YES         NO         0   
> >      0
> > m_topology          topo       RESTRING    ==    YES         NO         
> > NONE     0
> > m_topology_inuse    utopo      RESTRING    ==    YES         NO         
> > NONE     0
> > mem_free            mf         MEMORY      <=    YES         NO         0   
> >      0
> > mem_total           mt         MEMORY      <=    YES         NO         0   
> >      0
> > mem_used            mu         MEMORY      >=    YES         NO         0   
> >      0
> > min_cpu_interval    mci        TIME        <=    NO          NO         
> > 0:0:0    0
> > np_load_avg         nla        DOUBLE      >=    NO          NO         0   
> >      0
> > np_load_long        nll        DOUBLE      >=    NO          NO         0   
> >      0
> > np_load_medium      nlm        DOUBLE      >=    NO          NO         0   
> >      0
> > np_load_short       nls        DOUBLE      >=    NO          NO         0   
> >      0
> > num_proc            p          INT         ==    YES         NO         0   
> >      0
> > qname               q          RESTRING    ==    YES         NO         
> > NONE     0
> > rerun               re         BOOL        ==    NO          NO         0   
> >      0
> > s_core              s_core     MEMORY      <=    YES         NO         0   
> >      0
> > s_cpu               s_cpu      TIME        <=    YES         NO         
> > 0:0:0    0
> > s_data              s_data     MEMORY      <=    YES         NO         0   
> >      0
> > s_fsize             s_fsize    MEMORY      <=    YES         NO         0   
> >      0
> > s_rss               s_rss      MEMORY      <=    YES         NO         0   
> >      0
> > s_rt                s_rt       TIME        <=    YES         NO         
> > 0:0:0    0
> > s_stack             s_stack    MEMORY      <=    YES         NO         0   
> >      0
> > s_vmem              s_vmem     MEMORY      <=    YES         NO         0   
> >      0
> > seq_no              seq        INT         ==    NO          NO         0   
> >      0
> > slots               s          INT         <=    YES         YES        1   
> >      1000
> > swap_free           sf         MEMORY      <=    YES         NO         0   
> >      0
> > swap_rate           sr         MEMORY      >=    YES         NO         0   
> >      0
> > swap_rsvd           srsv       MEMORY      >=    YES         NO         0   
> >      0
> > swap_total          st         MEMORY      <=    YES         NO         0   
> >      0
> > swap_used           su         MEMORY      >=    YES         NO         0   
> >      0
> > tmpdir              tmp        RESTRING    ==    NO          NO         
> > NONE     0
> > virtual_free        vf         MEMORY      <=    YES         YES        0   
> >      0
> > virtual_total       vt         MEMORY      <=    YES         NO         0   
> >      0
> > virtual_used        vu         MEMORY      >=    YES         NO         0   
> >      0
> >
> >
> >
> > On Thu, May 9, 2013 at 10:43 AM, Reuti <[email protected]> wrote:
> > Am 09.05.2013 um 18:51 schrieb Chris Paciorek:
> >
> > > We're having a problem similar to that described in this thread:
> > > http://www.mentby.com/Group/grid-engine/62u4-resource-reservation-not-working-for-some-jobs.html
> > >
> > > We're running Grid Engine 6.2u5 for a cluster of 4 Linux nodes (32 cores 
> > > each) running Ubuntu 12.04 (Precise).
> > >
> > > We're seeing that jobs that request a reservation and are at the top of 
> > > the queue are not starting, with lower-priority jobs that are requesting 
> > > fewer cores slipping ahead of the higher priority job. An example of this 
> > > is at the bottom of this posting.
> >
> > Besides the defined "default_duration 7200:00:00": what h_rt/s_rt request 
> > was supplied to the short jobs?
> >
> > -- Reuti
> >
> >
> > > Here's the results of "qconf -ssconf":
> > > algorithm                         default
> > > schedule_interval                 0:0:15
> > > maxujobs                          0
> > > queue_sort_method                 load
> > > job_load_adjustments              np_load_avg=0.50
> > > load_adjustment_decay_time        0:7:30
> > > load_formula                      np_load_avg
> > > schedd_job_info                   true
> > > flush_submit_sec                  0
> > > flush_finish_sec                  0
> > > params                            MONITOR=1
> > > reprioritize_interval             0:0:0
> > > halftime                          720
> > > usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> > > compensation_factor               5.000000
> > > weight_user                       0.250000
> > > weight_project                    0.250000
> > > weight_department                 0.250000
> > > weight_job                        0.250000
> > > weight_tickets_functional         0
> > > weight_tickets_share              100000
> > > share_override_tickets            TRUE
> > > share_functional_shares           TRUE
> > > max_functional_jobs_to_schedule   200
> > > report_pjob_tickets               TRUE
> > > max_pending_tasks_per_job         50
> > > halflife_decay_list               none
> > > policy_hierarchy                  SOF
> > > weight_ticket                     1.000000
> > > weight_waiting_time               0.278000
> > > weight_deadline                   3600000.000000
> > > weight_urgency                    0.000000
> > > weight_priority                   0.000000
> > > max_reservation                   10
> > > default_duration                  7200:00:00
> > >
> > > Here's the example:
> > >
> > > Job #34378 was submitted as:
> > > qsub -pe smp 16 -R y -b y "R CMD BATCH --no-save tmp.R tmp.out"
> > >
> > >
> > > Soon after submitting #34378, we see that the job #34378 is next in line:
> > > job-ID  prior   name       user         state submit/start at     queue   
> > >                        slots ja-task-ID
> > > -----------------------------------------------------------------------------------------------------------------
> > >   33004 0.11762 tophat.sh  seqc         r     04/24/2013 07:14:20 
> > > [email protected]       32
> > >   33718 0.12405 fooSU_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33719 0.12405 fooSV_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33720 0.12405 fooWV_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33721 0.12405 fooWU_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33745 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:29:28 
> > > [email protected]        1
> > >   33758 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:30:28 
> > > [email protected]        1
> > >   33763 0.06583 toy.sh     yjhuoh       r     05/07/2013 22:33:58 
> > > [email protected]        1
> > >   33787 0.06583 toy.sh     yjhuoh       r     05/08/2013 00:15:58 
> > > [email protected]        1
> > >   33794 0.06583 toy.sh     yjhuoh       r     05/08/2013 01:45:58 
> > > [email protected]        1
> > >   34183 0.00570 SubSampleF isoform      r     05/09/2013 03:29:32 
> > > [email protected]        8
> > >   34185 0.00570 SubSampleF isoform      r     05/09/2013 04:27:47 
> > > [email protected]        8
> > >   34186 0.00570 SubSampleF isoform      r     05/09/2013 04:36:47 
> > > [email protected]        8
> > >   34187 0.00570 SubSampleF isoform      r     05/09/2013 05:05:02 
> > > [email protected]        8
> > >   34188 0.00570 SubSampleF isoform      r     05/09/2013 05:42:17 
> > > [email protected]        8
> > >   34189 0.00570 SubSampleF isoform      r     05/09/2013 06:12:47 
> > > [email protected]        8
> > >   34190 0.00570 SubSampleF isoform      r     05/09/2013 06:14:17 
> > > [email protected]        8
> > >   34191 0.00570 SubSampleF isoform      r     05/09/2013 07:07:32 
> > > [email protected]        8
> > >   34192 0.00570 SubSampleF isoform      r     05/09/2013 07:24:02 
> > > [email protected]        8
> > >   34194 0.00570 SubSampleF isoform      r     05/09/2013 07:37:17 
> > > [email protected]        8
> > >   34378 1.00000 R CMD BATC paciorek     qw    05/09/2013 08:14:31         
> > >                           16
> > >   34195 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34196 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34197 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34198 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34199 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34200 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34201 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34202 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34203 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34204 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34205 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34206 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34207 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34208 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34209 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34210 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >
> > > A little while later, we see that jobs 34195-34198 have slipped ahead of 
> > > 34378:
> > >
> > > job-ID  prior   name       user         state submit/start at     queue   
> > >                        slots ja-task-ID
> > > -----------------------------------------------------------------------------------------------------------------
> > >   33004 0.11790 tophat.sh  seqc         r     04/24/2013 07:14:20 
> > > [email protected]       32
> > >   33718 0.12398 fooSU_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33719 0.12398 fooSV_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33720 0.12398 fooWV_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33721 0.12398 fooWU_long lwtai        r     05/06/2013 17:01:58 
> > > [email protected]       1
> > >   33745 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:29:28 
> > > [email protected]        1
> > >   33758 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:30:28 
> > > [email protected]        1
> > >   33763 0.08234 toy.sh     yjhuoh       r     05/07/2013 22:33:58 
> > > [email protected]        1
> > >   33787 0.08234 toy.sh     yjhuoh       r     05/08/2013 00:15:58 
> > > [email protected]        1
> > >   34188 0.00568 SubSampleF isoform      r     05/09/2013 05:42:17 
> > > [email protected]        8
> > >   34189 0.00568 SubSampleF isoform      r     05/09/2013 06:12:47 
> > > [email protected]        8
> > >   34190 0.00568 SubSampleF isoform      r     05/09/2013 06:14:17 
> > > [email protected]        8
> > >   34191 0.00568 SubSampleF isoform      r     05/09/2013 07:07:32 
> > > [email protected]        8
> > >   34192 0.00568 SubSampleF isoform      r     05/09/2013 07:24:02 
> > > [email protected]        8
> > >   34194 0.00568 SubSampleF isoform      r     05/09/2013 07:37:17 
> > > [email protected]        8
> > >   34195 0.00568 SubSampleF isoform      r     05/09/2013 08:16:47 
> > > [email protected]        8
> > >   34196 0.00568 SubSampleF isoform      r     05/09/2013 08:47:32 
> > > [email protected]        8
> > >   34197 0.00568 SubSampleF isoform      r     05/09/2013 09:11:02 
> > > [email protected]        8
> > >   34198 0.00568 SubSampleF isoform      r     05/09/2013 09:16:32 
> > > [email protected]        8
> > >   34378 1.00000 R CMD BATC paciorek     qw    05/09/2013 08:14:31         
> > >                           16
> > >   34199 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34200 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34201 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34202 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34203 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34204 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34205 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34206 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34207 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34208 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34209 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34210 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:51         
> > >                            8
> > >   34211 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34212 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34213 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34214 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34215 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34216 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34217 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >   34218 0.00000 SubSampleF isoform      qw    05/08/2013 19:30:52         
> > >                            8
> > >
> > > The schedule file shows that there are RESERVING statements for #34378:
> > > 34378:1:RESERVING:1369228520:25920060:P:smp:slots:16.000000
> > > 34378:1:RESERVING:1369228520:25920060:Q:[email protected]:slots:16.000000
> > >
> > > Perhaps the issue is that the reservation seems specific to the cluster 
> > > node "scf-sm02.Berkeley.EDU", and that specific node is occupied by a 
> > > long-running job (#33004). If so, is there any way to have the 
> > > reservation not tied to a node?
> > >
> > > -Chris
> > >
> > > ----------------------------------------------------------------------------------------------
> > > Chris Paciorek
> > >
> > > Statistical Computing Consultant, Associate Research Statistician, 
> > > Lecturer
> > >
> > > Office: 495 Evans Hall                      Email: 
> > > [email protected]
> > > Mailing Address:                            Voice: 510-842-6670
> > > Department of Statistics                    Fax:   510-642-7892
> > > 367 Evans Hall                              Skype: cjpaciorek
> > > University of California, Berkeley          WWW:   
> > > www.stat.berkeley.edu/~paciorek
> > > Berkeley, CA 94720 USA                      Permanent forward: 
> > > [email protected]
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] resource reservation problem

Reply via email to