Re: DUCC doesn't use all available machines

Eddie Epstein Fri, 28 Nov 2014 16:15:49 -0800

Now you are hitting a limit configured in ducc.properties:

  # Max number of work-item CASes for each job
  ducc.threads.limit = 500


62 job process * 8 threads per process = 496 max concurrent work items.
This was put in to limit the memory required by the job driver. This value
can probably be pushed up in the range of 700-800 before the job driver
will go OOM. There are configuration parameters to increase JD memory:

  # Memory size in MB allocated for each JD
  ducc.jd.share.quantum = 450
  # JD max heap size. Should be smaller than the JD share quantum
  ducc.driver.jvm.args = -Xmx400M -DUimaAsCasTracking

DUCC would have to be restarted for the JD size parameters to take effect.

One of the current DUCC development items is to significantly reduce the
memory needed per work item, and raise the default limit for concurrent
work items by two or three orders of magnitude.



On Fri, Nov 28, 2014 at 6:40 PM, Simon Hafner <reactorm...@gmail.com> wrote:

> I've put the fudge to 12000, and it jumped immediately to 62 procs.
> However, it doesn't spawn new ones even though it has about 6k items
> left and it doesn't spawn more procs.
>
> 2014-11-17 15:30 GMT-06:00 Jim Challenger <chall...@gmail.com>:
> > It is also possible that RM "prediction" has decided that additional
> > processes are not needed.  It
> > appears that there were likely 64 work items dispatched, plus the 6
> > completed, leaving only
> > 30 that were "idle".  If these work items appeared to be completing
> quickly,
> > the RM would decide
> > that scale-up would be wasteful and not do it.
> >
> > Very gory details if you're interested:
> > The time to start a new processes is measured by the RM based on the
> > observed initialization time of the processes plus an estimate of how
> long
> > it would take to get
> > a new process actually running.  A fudge-factor is added on top of this
> > because in a large operation
> > it is wasteful to start processes (with associated preemptions) that only
> > end up doing a "few" work
> > tems.  All is subjective and configurable.
> >
> > The average time-per-work item is also reported to the RM.
> >
> > The RM then looks at the number of work items remaining, and the
> estimated
> > time needed to
> > processes this work based on the above, and if it determines that the job
> > will be completed before
> > new processes can be scaled up and initialized, it does not scale up.
> >
> > For short jobs, this can be a bit inaccurate, but those jobs are short :)
> >
> > For longer jobs, the time-per-work-item becomes increasingly accurate so
> the
> > RM prediction tends
> > to improve and ramp-up WILL occur if the work-item time turns out to be
> > larger than originally
> > thought.  (Our experience is that work-item times are mostly uniform with
> > occasional outliers, but
> > the prediction seems to work well).
> >
> > Relevant configuration parameters in ducc.properties:
> > # Predict when a job will end and avoid expanding if not needed. Set to
> > false to disable prediction.
> >    ducc.rm.prediction = true
> > # Add this fudge factor (milliseconds) to the expansion target when using
> > prediction
> >    ducc.rm.prediction.fudge = 120000
> >
> > You can observe this in the rm log, see the example below.  I'm
> preparing a
> > guide to this log; for now,
> > the net of these two log lines is: the projection for the job in question
> > (job 208927) is that 16 processes
> > are needed to complete this job, even though the job could use 20
> processes
> > at full expanseion - the BaseCap -
> > so a max of 16 will be scheduled for it,  subject to fair-share
> constraint.
> >
> > 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */getPrjCap/* 208927  bobuser
> O 2
> > T 343171 NTh 128 TI 143171 TR 6748.601431980907 R 1.8967e-02 QR 5043 P
> 6509
> > F 0 ST 1416254363603*/return 16/*
> > 17 Nov 2014 15:07:38,880  INFO RM.RmJob - */initJobCap/* 208927 bobuser
> O 2
> > */Base cap:/* 20 Expected future cap: 16 potential cap 16 actual cap 16
> >
> > Jim
> >
> >
> > On 11/17/14, 3:44 PM, Eddie Epstein wrote:
> >>
> >> DuccRawTextSpec.job specifies that each job process (JP)
> >> run 8 analytic pipeline threads. So for this job with 100 work
> >> items, no more than 13 JPs would ever be started.
> >>
> >> After successful initialization of the first JP, DUCC begins scaling
> >> up the number of JPs using doubling. During JP scale up the
> >> scheduler monitors the work item completion rate, compares that
> >> with the JP initialization time, and stops scaling up JPs when
> >> starting more JPs will not make the job run any faster.
> >>
> >> Of course JP scale up is also limited by the job's "fair share"
> >> of resources relative to total resources available for all preemptable
> >> jobs.
> >>
> >> To see more JPs, increase the number and/or size of the input text
> files,
> >> or decrease the number of pipeline threads per JP.
> >>
> >> Note that it can be counter productive to run "too many" pipeline
> >> threads per machine. Assuming analytic threads are 100% CPU bound,
> >> running more threads than real cores will often slow down the overall
> >> document processing rate.
> >>
> >>
> >> On Mon, Nov 17, 2014 at 6:48 AM, Simon Hafner <reactorm...@gmail.com>
> >> wrote:
> >>
> >>> I fired the DuccRawTextSpec.job on a cluster consisting of three
> >>> machines, with 100 documents. The scheduler only runs the processes on
> >>> two machines instead of all three. Can I mess with a few config
> >>> variables to make it use all three?
> >>>
> >>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:1
> >>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:2
> >>> id:22 state:Running total:100 done:0 error:0 retry:0 procs:4
> >>> id:22 state:Running total:100 done:1 error:0 retry:0 procs:8
> >>> id:22 state:Running total:100 done:6 error:0 retry:0 procs:8
> >>>
> >
>

Re: DUCC doesn't use all available machines

Reply via email to