Hi Marshall,

Thanks for your reply. I should have included details of our machine. It has four CPUs each with 8 cores; here's the /proc/cpuinfo stanza for one of the cores: <http://pastebin.com/F8YXs7XL>

The machine has 64G RAM; I was running dstat at the same time as the AS pipeline, and it indicated no paging. Here's a typical snapshot of the memory part of top during the run:

Mem:  66096700k total, 43877612k used, 22219088k free,   608092k buffers
Swap: 47321080k total,  5097644k used, 42223436k free, 22594808k cached

I'll intersperse further comments in your text.

On 08/16/2011 11:25 AM, Marshall Schor wrote:
When looking at performance, it's important to get more details around the
possibilities.  For instance, is the machine your running on a multi-core
machine (meaning it can run multiple threads at the same time, for increased
performance), and if so, how many cores?  The Intel i7 I think supports 8
threads at once, for instance.

If you have a machine that is capable of this kind of performance, then you can
configure UIMA to take advantage of this, (or not).  For instance, in UIMA-AS
you could specify limiting the number of things a pipeline could be doing at
once, to just one thing (so, for example, if you had 8 cores, but wanted the
pipeline to just use one, so the other cores could be used for other things, you
can specify that.

You actually did that, in your deployment descriptor, by limiting the scaleout
to 1 and the size of the CAS pool to 1.  So I would not expect any "scaleout"
effect from this kind of configuration :-).

Right. In this case, I was trying to compare apples to applies, i.e. a single-threaded CPE with a single-threaded AS pipeline. The AS pipeline was 16x slower than the CPE. Even an AS pipeline with 5 deployments of of the AE was 12x slower than the single-threaded CPE. Please note that I'm definitely open to the possibility that I'm overlooking something really elementary, and I'm willing to be shown the error of my ways.

Likewise, in the CPE you can specify the scaleout of the AE's, to take advantage
of multiple cores.  You didn't say what you specified here, so I don't know...

Unfortunately, some components of the AE I'm using are not thread-safe, so I can't take advantage of scaleout within a single AE instance: I deploy each instance separately so that it has its own JVM. This script shows how I deployed the five instances with one CR: <http://pastebin.com/sczHwRuD>

Also, CPEs can be run in "Integrated" or other kinds of modes.  In Integrated,
this means that the CPE runs everything on one machine, in the same JVM.  So
there is no serialization (sending / receiving the CAS over a TCP/IP connection
to/from a remote machine).  Of course, that limits the scaling you get, to
whatever the 1 machine can supply.  Again, you didn't specify what kind of CPE
deployment you did.

Here is the CPE desriptor I used: <http://pastebin.com/ipKyCc8B> Of course, many of the parameters are hidden in the CR and AE descriptors. Not sure if they have much bearing on what I'm seeing.

Thanks, and again, I really appreciate the comments from the members of this 
list.

Chuck


(Note that you can configure UIMA-AS to also run in one JVM with no
serialization, but I don't think that was the case here.)

Finally, if you are running in your test in the UIMA-AS case, the client code,
the broker, and the service all on one machine, is that machine "paging"?  This
may be dependent on the size of your CASes, etc.

-Marshall


On 8/16/2011 11:48 AM, Charles Bearden wrote:
Thank you Jerry&  Eddie for your responses to my previous questions. I
appreciate the opportunity to learn.

Based on a little testing, I'm starting to think that AS is not designed for
performance-enhancing scale-out, but maybe rather for architectural clarity. I
have a CPE that has a collection reader that reads sentences from a database,
and an aggregate AE that is the cTAKES AggregatePlaintextProcessor (using our
dictionary for dictionary lookup) plus an AE that writes the concept
annotations to a database. When I put these together as a CPE and run it
against a test set of 2553 sentences, it takes about one minute, sometimes a
few seconds less. The CpmFrame GUI indicates that the CR accounts for about 5%
of the processing time, and the AE for the other 95%, with the LVG annotator&
dictionary lookup each accounting for between 35%-45%.

When I use the same CR&  aggregate AE like this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
   -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
   -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

it takes 16 minutes to process the same 2553 sentences.
Deploy_DictionaryTest.xml is the deployment descriptor; you can see its
contents here:<http://pastebin.com/6nhuaC4H>.

When I deploy the AE five times with 'deployAsyncService.sh' like this:

deployAsyncService.sh \
   sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like
this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
   -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml

it still takes 12 minutes to process the 2553 sentences. I can see from the
log files that the processing is being scaled out. Given that in the CPE the
CASes spent only 5% of their time in the CR, I'm skeptical it has become the
bottleneck, though I could be wrong. I'm just wondering if this kind of
performance difference is expected.

Thanks,
Chuck


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to