Hi Tam,

This limit is 500 for the job. It is due to the memory size of keeping a
copy of all active "work item" CASes in the JD.

This has not been a problem for our users because work items are not
individual documents. Rather they are groups of documents (or groups of
CASes) things like all files in a directory, or files containing many
documents. Then the CM (Cas Multiplier) running in each thread of each JP
(JobProcess) reads the data and creates CASes for each document (or input
CAS) to send down the pipeline. This also allows grouping the output
corresponding to the work item (e.g. for many documents) into a single
output file.

See the DUCC sample apps for an example of breaking a single text file into
many documents, and grouping all the output CASes for the documents in the
file into a single zipfile.

We are working on a change that will significantly increase the max number
of dispatched CASes.

Eddie



On Wed, Nov 5, 2014 at 8:55 PM, Thanh Tam Nguyen <nthanh...@gmail.com>
wrote:

> Hi Eddie,
> I've checked the webserver. Since I have been testing on a small collection
> of documents (20 documents), there were 15 work items for the job.
>
> Did you mean 500 work items per machine?
>
> Regards,
> Tam
>
> On Thu, Nov 6, 2014 at 1:20 AM, Eddie Epstein <eaepst...@gmail.com> wrote:
>
> > Hi,
> >
> > There is a default limit of 500 work items dispatched at the same time.
> How
> > many dispatched are shown for the job?
> >
> > Eddie
> >
> >
> > On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <nthanh...@gmail.com>
> > wrote:
> >
> > > Hi Eddie,
> > > Thanks for your email. I followed the documentation and I was able to
> run
> > > DUCC jobs using different user instead of user "ducc". But while I was
> > > watching the webserver, I only found one machine running the jobs. In
> the
> > > tab System>Machines, I can see all the machine statuses are "up". What
> > > should I do to run the jobs on all machines?
> > >
> > >
> > > Regards,
> > > Tam
> > >
> > > On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <eaepst...@gmail.com>
> > > wrote:
> > >
> > > > Hi Tam,
> > > >
> > > > In the install documentation,
> > > > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> > > > the section "Multi-User Installation and Verification" describes how
> to
> > > > configure setuid-root
> > > > for ducc_ling so that DUCC jobs are run as the submitting user
> instead
> > of
> > > > user "ducc".
> > > >
> > > > The setuid-root ducc_ling should be put on every DUCC node, in the
> same
> > > > place,
> > > > and ducc.properties updated to point at that location.
> > > >
> > > > Eddie
> > > >
> > > >
> > > > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <
> nthanh...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Eddie,
> > > > > Would you tell me more details how to setup DUCC for multiuser
> mode?
> > > > FYI, I
> > > > > have successfully setup and ran my UIMA analysis engine on single
> > user
> > > > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how
> > to
> > > > get
> > > > > it worked on a cluster of machines.
> > > > >
> > > > > Thanks,
> > > > > Tam
> > > > >
> > > > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <
> eaepst...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem
> > accessible
> > > > from
> > > > > > all machines.
> > > > > > For single user mode ducc_ling could be referenced from there as
> > > well.
> > > > > > But for multiuser setup, ducc_ling needs setuid and should be
> > > installed
> > > > > on
> > > > > > the root drive.
> > > > > >
> > > > > > Eddie
> > > > > >
> > > > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <
> > > james.d.ba...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I've been working through the installation of UIMA DUCC, and
> have
> > > > > > > successfully got it set up and running on a single machine. I'd
> > now
> > > > > like
> > > > > > to
> > > > > > > move to running it on a cluster of machines, but it isn't clear
> > to
> > > me
> > > > > > from
> > > > > > > the installation guide as to whether I need to install DUCC on
> > each
> > > > > node,
> > > > > > > or whether ducc_ling is the only thing that needs installing on
> > the
> > > > > > > non-head nodes.
> > > > > > >
> > > > > > > Could anyone shed some light on the process please?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to