Ok, I've included an aggregator in the splitter, as follows: 

                <camel:route id="pager" autoStartup="true">
                        <camel:from
                                
uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}"
 />
                        <camel:log message="Iniciando paging" />
                        <camel:setHeader headerName="start">
                                
<camel:simple>${date:now:mm}:${date:now:ss}.${date:now:SSS}</camel:simple>
                        </camel:setHeader>
                        <camel:split streaming="true" 
parallelProcessing="false">
                                <camel:tokenize token="\n" />
                                <!-- <camel:log 
message="${property.CamelSplitIndex}" /> -->
                                <camel:to uri="bean:pager" />
                                <camel:aggregate 
strategyRef="aggregatorStrategy">
                                        <camel:correlationExpression>
                                                
<camel:simple>${file:name}</camel:simple>
                                        </camel:correlationExpression>
                                        <camel:completionSize>
                                                
<camel:constant>250</camel:constant>
                                        </camel:completionSize>
                                        <camel:to
                                                
uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append"
 />
                                </camel:aggregate>
                        </camel:split>
                        <camel:log
                                message="Elapsed: ${header.start} - 
${date:now:mm}:${date:now:ss}.${date:now:SSS}" />
                </camel:route>


And the AggregationStrategy:

        <bean id="aggregatorStrategy" 
class="cl.altiuz.reports.etl.ConcatAggregationStrategy" />


I've also added some headers & logging to calculate elapsed time.

Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), 
and now is about half (15 secs), I can see clearly the improvement, but not as 
much as expected.

Any extra tips? I''ve included the custom AggregationStrategy I had to create, 
as all I needed was appending/concatenating body contents.



Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasq...@altiuz.cl
http://www.altiuz.cl
 

El 09-11-2012, a las 15:09, Christian Müller <christian.muel...@gmail.com> 
escribió:

> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
> 
> Best,
> Christian
> 
> On Fri, Nov 9, 2012 at 6:57 PM, <ramkumar.i...@cognizant.com> wrote:
> 
>> You may also want to check out Hadoop and map reduce
>> 
>> 
>> 
>> http://camel.apache.org/hdfs.html
>> 
>> 
>> 
>> with respect to point a and b.
>> 
>> 
>> 
>> You can have an index on the record and the “reduce” job can serialize on
>> the index.
>> 
>> 
>> 
>> *From:* Gonzalo Vasquez [mailto:gvasq...@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 10:16 PM
>> *To:* users@camel.apache.org
>> *Subject:* Re: Camel performance tuning
>> 
>> 
>> 
>> Thanks for your answer, my comments:
>> 
>> 
>> 
>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>> file size could be in the range of GB. Notwithstanding, I'll check what
>> Hypersonic & Mongo are, as I'm not aware of them.
>> 
>> b) Parallel processing is set to false, because records must preserve
>> order on the output file
>> 
>> c) Don't see the point here
>> 
>> d) See a)
>> 
>> e) what about async processing? There's no "long running process" here
>> 
>> 
>> 
>> Thanks again.-
>> 
>> 
>> 
>> *Gonzalo Vásquez Sáez*
>> 
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gco...@altiuz.com>l*
>> 
>> *http://www.altiuz.cl*
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> El 09-11-2012, a las 13:12, <ramkumar.i...@cognizant.com> escribió:
>> 
>> 
>> 
>>  I am really new to Camel but here are some options you can try
>> 
>> 
>> 
>> a)      Can you load the 5 MB file to memory before splitting it ? That
>> way IO will not be a problem. Probably put it in something like Hypersonic
>> or Mongo
>> 
>> b)      Why is parallel  processing false ? Are the records related to
>> each other ? If true you can take advantage of multicore
>> 
>> c)       Is it possible to first split the files into chunks and then use
>> process the chunks independently ?
>> 
>> d)      Can you write into memory and flush at once ?
>> 
>> e)      Sync/Asynch : http://camel.apache.org/async.html
>> 
>> 
>> 
>> *From:* Gonzalo Vasquez [mailto:gvasq...@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 8:32 PM
>> *To:* users@camel.apache.org
>> *Subject:* Camel performance tuning
>> 
>> 
>> 
>> I'm running a route that basically adds a character per line to a plain
>> text file, but it's taking to long, and it seems that it's due to some kind
>> of buffering issue when reading/writing from disk.
>> 
>> 
>> 
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>> 
>> 
>> 
>> It's taking for ever to process such a file, I understand I'm tokenizing
>> on line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>> 
>> 
>> 
>> This is the route:
>> 
>> <camel:route id="pager" autoStartup="true">
>> 
>> <camel:from
>> 
>> uri="
>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>> " />
>> 
>> <camel:split streaming="true" parallelProcessing="false">
>> 
>> <camel:tokenize token="\n" />
>> 
>> <camel:to uri="bean:pager" />
>> 
>> <camel:to
>> 
>> uri="
>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>> " />
>> 
>> </camel:split>
>> 
>> </camel:route>
>> 
>> 
>> 
>> This is the referenced bean:
>> 
>> 
>> 
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>> 
>> <property name="xsltPath"
>> 
>> value=
>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>> />
>> 
>> <property name="param" value="C.*PAG.* 1" />
>> 
>> </bean>
>> 
>> 
>> 
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>> 
>> 
>> 
>> Any ideas? Any thing else that I should send?
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> *Gonzalo Vásquez Sáez*
>> 
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gco...@altiuz.com>l*
>> 
>> *http://www.altiuz.cl*
>> 
>> 
>> 
>> 
>> 
>>       This e-mail and any files transmitted with it are for the sole use
>> of the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>> 
>> 
>> This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>> 
> 
> 
> 
> --

Reply via email to