On 08/16/2011 11:15 AM, Eddie Epstein wrote:
The CPE runs everything in the same process. UIMA AS could deploy this pipeline in one process and get the same performance as the CPE.
Thanks again for your reply. I thought that I was deploying the pipeline in one AS process with the first option for running it:
runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xmlIt looks like one process in the output of ps. I'm just surprised that the performance is so much slower (16x slower).
Scaling out to multiple processes incurs overhead, which for UIMA AS essentially consists of CAS serialization and communication. Figure 5 on http://uima.apache.org/doc-uimaas-what.html will have much lower overhead for this scenario.
To create a pipeline with an architecture like Figure 5, I would use the example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the uima_async_scaleout.pdf for 2.3.1?
Thanks, Chuck
Eddie On Tue, Aug 16, 2011 at 11:48 AM, Charles Bearden <charles.f.bear...@uth.tmc.edu> wrote:Thank you Jerry& Eddie for your responses to my previous questions. I appreciate the opportunity to learn. Based on a little testing, I'm starting to think that AS is not designed for performance-enhancing scale-out, but maybe rather for architectural clarity. I have a CPE that has a collection reader that reads sentences from a database, and an aggregate AE that is the cTAKES AggregatePlaintextProcessor (using our dictionary for dictionary lookup) plus an AE that writes the concept annotations to a database. When I put these together as a CPE and run it against a test set of 2553 sentences, it takes about one minute, sometimes a few seconds less. The CpmFrame GUI indicates that the CR accounts for about 5% of the processing time, and the AE for the other 95%, with the LVG annotator& dictionary lookup each accounting for between 35%-45%. When I use the same CR& aggregate AE like this: runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml it takes 16 minutes to process the same 2553 sentences. Deploy_DictionaryTest.xml is the deployment descriptor; you can see its contents here:<http://pastebin.com/6nhuaC4H>. When I deploy the AE five times with 'deployAsyncService.sh' like this: deployAsyncService.sh \ sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like this: runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml it still takes 12 minutes to process the 2553 sentences. I can see from the log files that the processing is being scaled out. Given that in the CPE the CASes spent only 5% of their time in the CR, I'm skeptical it has become the bottleneck, though I could be wrong. I'm just wondering if this kind of performance difference is expected. Thanks, Chuck -- Chuck Bearden Programmer Analyst IV The University of Texas Health Science Center at Houston School of Biomedical Informatics Email: charles.f.bear...@uth.tmc.edu Phone: 713.500.9672
smime.p7s
Description: S/MIME Cryptographic Signature