Ok, I've included an aggregator in the splitter, as follows: <camel:route id="pager" autoStartup="true"> <camel:from uri="file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" /> <camel:log message="Iniciando paging" /> <camel:setHeader headerName="start"> <camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple> </camel:setHeader> <camel:split streaming="true" parallelProcessing="false"> <camel:tokenize token="\n" /> <!-- <camel:log message="${property.CamelSplitIndex}" /> --> <camel:to uri="bean:pager" /> <camel:aggregate strategyRef="aggregatorStrategy"> <camel:correlationExpression> <camel:simple>${file:name}</camel:simple> </camel:correlationExpression> <camel:completionSize> <camel:constant>250</camel:constant> </camel:completionSize> <camel:to uri="file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append" /> </camel:aggregate> </camel:split> <camel:log message="Elapsed: ${header.start} - ${date:now:mm}:${date:now:ss}.${date:now:SSS}" /> </camel:route>
And the AggregationStrategy: <bean id="aggregatorStrategy" class="cl.altiuz.reports.etl.ConcatAggregationStrategy" /> I've also added some headers & logging to calculate elapsed time. Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), and now is about half (15 secs), I can see clearly the improvement, but not as much as expected. Any extra tips? I''ve included the custom AggregationStrategy I had to create, as all I needed was appending/concatenating body contents. Gonzalo Vásquez Sáez Gerente Investigación y Desarrollo (R&D) Altiuz Soluciones Tecnológicas de Negocios Ltda. Av. Nueva Tajamar 555 Of. 802, Las Condes (56-2) 335 2461 gvasq...@altiuz.cl http://www.altiuz.cl
El 09-11-2012, a las 15:09, Christian Müller <christian.muel...@gmail.com> escribió: > Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering" > the requirement and will end up in much more complicated solution - IMO. > > Best, > Christian > > On Fri, Nov 9, 2012 at 6:57 PM, <ramkumar.i...@cognizant.com> wrote: > >> You may also want to check out Hadoop and map reduce >> >> >> >> http://camel.apache.org/hdfs.html >> >> >> >> with respect to point a and b. >> >> >> >> You can have an index on the record and the “reduce” job can serialize on >> the index. >> >> >> >> *From:* Gonzalo Vasquez [mailto:gvasq...@altiuz.cl] >> *Sent:* Friday, November 09, 2012 10:16 PM >> *To:* users@camel.apache.org >> *Subject:* Re: Camel performance tuning >> >> >> >> Thanks for your answer, my comments: >> >> >> >> a) a 5M file could be loaded into memory, but I have streaming enabled as >> file size could be in the range of GB. Notwithstanding, I'll check what >> Hypersonic & Mongo are, as I'm not aware of them. >> >> b) Parallel processing is set to false, because records must preserve >> order on the output file >> >> c) Don't see the point here >> >> d) See a) >> >> e) what about async processing? There's no "long running process" here >> >> >> >> Thanks again.- >> >> >> >> *Gonzalo Vásquez Sáez* >> >> *Gerente Investigación y Desarrollo (R&D)* >> *Altiuz* Soluciones Tecnológicas de Negocios Ltda. >> Av. Nueva Tajamar 555 Of. 802, Las Condes >> (56-2) 335 2461 >> *gvasquez@altiuz.c <gco...@altiuz.com>l* >> >> *http://www.altiuz.cl* >> >> >> >> >> >> >> >> El 09-11-2012, a las 13:12, <ramkumar.i...@cognizant.com> escribió: >> >> >> >> I am really new to Camel but here are some options you can try >> >> >> >> a) Can you load the 5 MB file to memory before splitting it ? That >> way IO will not be a problem. Probably put it in something like Hypersonic >> or Mongo >> >> b) Why is parallel processing false ? Are the records related to >> each other ? If true you can take advantage of multicore >> >> c) Is it possible to first split the files into chunks and then use >> process the chunks independently ? >> >> d) Can you write into memory and flush at once ? >> >> e) Sync/Asynch : http://camel.apache.org/async.html >> >> >> >> *From:* Gonzalo Vasquez [mailto:gvasq...@altiuz.cl] >> *Sent:* Friday, November 09, 2012 8:32 PM >> *To:* users@camel.apache.org >> *Subject:* Camel performance tuning >> >> >> >> I'm running a route that basically adds a character per line to a plain >> text file, but it's taking to long, and it seems that it's due to some kind >> of buffering issue when reading/writing from disk. >> >> >> >> I'm processing a 5MB file (attached as DC_FACCL132_0000 >> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL >> template (also attached). >> >> >> >> It's taking for ever to process such a file, I understand I'm tokenizing >> on line breaks, which could be the source of the problem as there are many >> lines in the file (48198 exactly), but when running jvisualvm (see attached >> images/snapshot)I can see the writing op is invoked 20386 times, which seem >> not related to the line count. Is there an output buffer size that I can >> configure? Or something like that? >> >> >> >> This is the route: >> >> <camel:route id="pager" autoStartup="true"> >> >> <camel:from >> >> uri=" >> file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} >> " /> >> >> <camel:split streaming="true" parallelProcessing="false"> >> >> <camel:tokenize token="\n" /> >> >> <camel:to uri="bean:pager" /> >> >> <camel:to >> >> uri=" >> file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append >> " /> >> >> </camel:split> >> >> </camel:route> >> >> >> >> This is the referenced bean: >> >> >> >> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor"> >> >> <property name="xsltPath" >> >> value= >> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" >> /> >> >> <property name="param" value="C.*PAG.* 1" /> >> >> </bean> >> >> >> >> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think >> isn't a platform dependent problem, but a configuration one. >> >> >> >> Any ideas? Any thing else that I should send? >> >> >> >> Thanks! >> >> >> >> *Gonzalo Vásquez Sáez* >> >> *Gerente Investigación y Desarrollo (R&D)* >> *Altiuz* Soluciones Tecnológicas de Negocios Ltda. >> Av. Nueva Tajamar 555 Of. 802, Las Condes >> (56-2) 335 2461 >> *gvasquez@altiuz.c <gco...@altiuz.com>l* >> >> *http://www.altiuz.cl* >> >> >> >> >> >> This e-mail and any files transmitted with it are for the sole use >> of the intended recipient(s) and may contain confidential and privileged >> information. If you are not the intended recipient(s), please reply to the >> sender and destroy all copies of the original message. Any unauthorized >> review, use, disclosure, dissemination, forwarding, printing or copying of >> this email, and/or any action taken in reliance on the contents of this >> e-mail is strictly prohibited and may be unlawful. >> >> >> This e-mail and any files transmitted with it are for the sole use of >> the intended recipient(s) and may contain confidential and privileged >> information. If you are not the intended recipient(s), please reply to the >> sender and destroy all copies of the original message. Any unauthorized >> review, use, disclosure, dissemination, forwarding, printing or copying of >> this email, and/or any action taken in reliance on the contents of this >> e-mail is strictly prohibited and may be unlawful. >> > > > > --