Re: DUCC- process_dd

reshu.agarwal Thu, 30 Apr 2015 21:43:03 -0700

Eddie,

I was using this same scenario and doing hit and try to compare thiswith UIMA AS to get the more scaled pipeline as I think UIMA AS can alsodid this. But I am unable to touch the processing time of DUCC's defaultconfiguration like you mentioned with UIMA AS.

Can you help me in doing this? I just want to do scaling by using bestconfiguration of UIMA AS and DUCC which can be done using process_dd.But How??


Thanks in advanced.

Reshu.

On 05/01/2015 03:28 AM, Eddie Epstein wrote:

The simplest way of vertically scaling a Job process is to specify the
analysis pipeline using core UIMA descriptors and then using
--process_thread_count to specify how many copies of the pipeline to
deploy, each in a different thread. No use of UIMA-AS at all. Please check
out the "Raw Text Processing" sample application that comes with DUCC.

On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <reshu.agar...@orkash.com>
wrote:

Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
AEs both.

I want to scale aggregate as well as individual AEs. Is there any way of
doing this in UIMA AS/DUCC?



On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:

In async aggregate you scale individual AEs not the aggregate as a whole.
The below configuration should do that. Are there any warnings from
dd2spring at startup with your configuration?

<analysisEngine async="true" >

                                  <delegates>
                                          <analysisEngine
key="ChunkerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="NEDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="StemmerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="ConsumerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                  </delegates>
                          </analysisEngine>

Jerry

On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <reshu.agar...@orkash.com>
wrote:

  Hi,

I was trying to scale my processing pipeline to be run in DUCC
environment
with uima as process_dd. If I was trying to scale using the below given
configuration, the threads started were not as expected:


<analysisEngineDeploymentDescription
          xmlns="http://uima.apache.org/resourceSpecifier";>

          <name>Uima v3 Deployment Descripter</name>
          <description>Deploys Uima v3 Aggregate AE using the Advanced
Fixed
Flow
                  Controller</description>

          <deployment protocol="jms" provider="activemq">
                  <casPool numberOfCASes="5" />
                  <service>
                          <inputQueue endpoint="UIMA_Queue_test"
brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
                          <topDescriptor>
                                  <import

location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
/>
                          </topDescriptor>
                          <analysisEngine async="true"
key="FlowControllerAgg" internalReplyQueueScaleout="10"
inputQueueScaleout="10">
                                  <scaleout numberOfInstances="5"/>
                                  <delegates>
                                          <analysisEngine
key="ChunkerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="NEDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="StemmerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                          <analysisEngine
key="ConsumerDescriptor">
                                                  <scaleout
numberOfInstances="5" />
                                          </analysisEngine>
                                  </delegates>
                          </analysisEngine>
                  </service>
          </deployment>

</analysisEngineDeploymentDescription>


There should be 5 threads of FlowControllerAgg where each thread will
have
5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
and
ConsumerDescriptor.

But I didn't think it is actually happening in case of DUCC.

Thanks in advance.

Reshu.

Re: DUCC- process_dd

Reply via email to