Hi Marshall,Thanks for your reply. I should have included details of our machine. It has four CPUs each with 8 cores; here's the /proc/cpuinfo stanza for one of the cores: <http://pastebin.com/F8YXs7XL>
The machine has 64G RAM; I was running dstat at the same time as the AS pipeline, and it indicated no paging. Here's a typical snapshot of the memory part of top during the run:
Mem: 66096700k total, 43877612k used, 22219088k free, 608092k buffers Swap: 47321080k total, 5097644k used, 42223436k free, 22594808k cached I'll intersperse further comments in your text. On 08/16/2011 11:25 AM, Marshall Schor wrote:
When looking at performance, it's important to get more details around the possibilities. For instance, is the machine your running on a multi-core machine (meaning it can run multiple threads at the same time, for increased performance), and if so, how many cores? The Intel i7 I think supports 8 threads at once, for instance. If you have a machine that is capable of this kind of performance, then you can configure UIMA to take advantage of this, (or not). For instance, in UIMA-AS you could specify limiting the number of things a pipeline could be doing at once, to just one thing (so, for example, if you had 8 cores, but wanted the pipeline to just use one, so the other cores could be used for other things, you can specify that. You actually did that, in your deployment descriptor, by limiting the scaleout to 1 and the size of the CAS pool to 1. So I would not expect any "scaleout" effect from this kind of configuration :-).
Right. In this case, I was trying to compare apples to applies, i.e. a single-threaded CPE with a single-threaded AS pipeline. The AS pipeline was 16x slower than the CPE. Even an AS pipeline with 5 deployments of of the AE was 12x slower than the single-threaded CPE. Please note that I'm definitely open to the possibility that I'm overlooking something really elementary, and I'm willing to be shown the error of my ways.
Likewise, in the CPE you can specify the scaleout of the AE's, to take advantage of multiple cores. You didn't say what you specified here, so I don't know...
Unfortunately, some components of the AE I'm using are not thread-safe, so I can't take advantage of scaleout within a single AE instance: I deploy each instance separately so that it has its own JVM. This script shows how I deployed the five instances with one CR: <http://pastebin.com/sczHwRuD>
Also, CPEs can be run in "Integrated" or other kinds of modes. In Integrated, this means that the CPE runs everything on one machine, in the same JVM. So there is no serialization (sending / receiving the CAS over a TCP/IP connection to/from a remote machine). Of course, that limits the scaling you get, to whatever the 1 machine can supply. Again, you didn't specify what kind of CPE deployment you did.
Here is the CPE desriptor I used: <http://pastebin.com/ipKyCc8B> Of course, many of the parameters are hidden in the CR and AE descriptors. Not sure if they have much bearing on what I'm seeing.
Thanks, and again, I really appreciate the comments from the members of this list. Chuck
(Note that you can configure UIMA-AS to also run in one JVM with no serialization, but I don't think that was the case here.) Finally, if you are running in your test in the UIMA-AS case, the client code, the broker, and the service all on one machine, is that machine "paging"? This may be dependent on the size of your CASes, etc. -Marshall On 8/16/2011 11:48 AM, Charles Bearden wrote:Thank you Jerry& Eddie for your responses to my previous questions. I appreciate the opportunity to learn. Based on a little testing, I'm starting to think that AS is not designed for performance-enhancing scale-out, but maybe rather for architectural clarity. I have a CPE that has a collection reader that reads sentences from a database, and an aggregate AE that is the cTAKES AggregatePlaintextProcessor (using our dictionary for dictionary lookup) plus an AE that writes the concept annotations to a database. When I put these together as a CPE and run it against a test set of 2553 sentences, it takes about one minute, sometimes a few seconds less. The CpmFrame GUI indicates that the CR accounts for about 5% of the processing time, and the AE for the other 95%, with the LVG annotator& dictionary lookup each accounting for between 35%-45%. When I use the same CR& aggregate AE like this: runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml it takes 16 minutes to process the same 2553 sentences. Deploy_DictionaryTest.xml is the deployment descriptor; you can see its contents here:<http://pastebin.com/6nhuaC4H>. When I deploy the AE five times with 'deployAsyncService.sh' like this: deployAsyncService.sh \ sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like this: runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml it still takes 12 minutes to process the 2553 sentences. I can see from the log files that the processing is being scaled out. Given that in the CPE the CASes spent only 5% of their time in the CR, I'm skeptical it has become the bottleneck, though I could be wrong. I'm just wondering if this kind of performance difference is expected. Thanks, Chuck
smime.p7s
Description: S/MIME Cryptographic Signature