On 08/16/2011 11:15 AM, Eddie Epstein wrote:
The CPE runs everything in the same process. UIMA AS could deploy this
pipeline in one process and get the same performance as the CPE.

Thanks again for your reply. I thought that I was deploying the pipeline in one AS process with the first option for running it:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

It looks like one process in the output of ps. I'm just surprised that the performance is so much slower (16x slower).

Scaling out to multiple processes incurs overhead, which for UIMA AS
essentially consists of CAS serialization and communication. Figure 5
on http://uima.apache.org/doc-uimaas-what.html will have much lower
overhead for this scenario.

To create a pipeline with an architecture like Figure 5, I would use the example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the uima_async_scaleout.pdf for 2.3.1?

Thanks,
Chuck

Eddie

On Tue, Aug 16, 2011 at 11:48 AM, Charles Bearden
<charles.f.bear...@uth.tmc.edu>  wrote:
Thank you Jerry&  Eddie for your responses to my previous questions. I
appreciate the opportunity to learn.

Based on a little testing, I'm starting to think that AS is not designed for
performance-enhancing scale-out, but maybe rather for architectural clarity.
I have a CPE that has a collection reader that reads sentences from a
database, and an aggregate AE that is the cTAKES AggregatePlaintextProcessor
(using our dictionary for dictionary lookup) plus an AE that writes the
concept annotations to a database. When I put these together as a CPE and
run it against a test set of 2553 sentences, it takes about one minute,
sometimes a few seconds less. The CpmFrame GUI indicates that the CR
accounts for about 5% of the processing time, and the AE for the other 95%,
with the LVG annotator&  dictionary lookup each accounting for between
35%-45%.

When I use the same CR&  aggregate AE like this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

it takes 16 minutes to process the same 2553 sentences.
Deploy_DictionaryTest.xml is the deployment descriptor; you can see its
contents here:<http://pastebin.com/6nhuaC4H>.

When I deploy the AE five times with 'deployAsyncService.sh' like this:

deployAsyncService.sh \
  sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like
this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml

it still takes 12 minutes to process the 2553 sentences. I can see from the
log files that the processing is being scaled out. Given that in the CPE the
CASes spent only 5% of their time in the CR, I'm skeptical it has become the
bottleneck, though I could be wrong. I'm just wondering if this kind of
performance difference is expected.

Thanks,
Chuck
--
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: charles.f.bear...@uth.tmc.edu
Phone: 713.500.9672

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to