You may also want to check out Hadoop and map reduce

http://camel.apache.org/hdfs.html

with respect to point a and b.

You can have an index on the record and the "reduce" job can serialize on the 
index.

From: Gonzalo Vasquez [mailto:gvasq...@altiuz.cl]
Sent: Friday, November 09, 2012 10:16 PM
To: users@camel.apache.org
Subject: Re: Camel performance tuning

Thanks for your answer, my comments:

a) a 5M file could be loaded into memory, but I have streaming enabled as file 
size could be in the range of GB. Notwithstanding, I'll check what Hypersonic & 
Mongo are, as I'm not aware of them.
b) Parallel processing is set to false, because records must preserve order on 
the output file
c) Don't see the point here
d) See a)
e) what about async processing? There's no "long running process" here

Thanks again.-

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.c<mailto:gco...@altiuz.com>l
http://www.altiuz.cl<http://www.altiuz.cl/>


[cid:image001.jpg@01CDBED1.BDBD8950]

El 09-11-2012, a las 13:12, 
<ramkumar.i...@cognizant.com<mailto:ramkumar.i...@cognizant.com>> escribió:


I am really new to Camel but here are some options you can try

a)      Can you load the 5 MB file to memory before splitting it ? That way IO 
will not be a problem. Probably put it in something like Hypersonic or Mongo
b)      Why is parallel  processing false ? Are the records related to each 
other ? If true you can take advantage of multicore
c)       Is it possible to first split the files into chunks and then use 
process the chunks independently ?
d)      Can you write into memory and flush at once ?
e)      Sync/Asynch : http://camel.apache.org/async.html

From: Gonzalo Vasquez [mailto:gvasq...@altiuz.cl]
Sent: Friday, November 09, 2012 8:32 PM
To: users@camel.apache.org<mailto:users@camel.apache.org>
Subject: Camel performance tuning

I'm running a route that basically adds a character per line to a plain text 
file, but it's taking to long, and it seems that it's due to some kind of 
buffering issue when reading/writing from disk.

I'm processing a 5MB file (attached as DC_FACCL132_0000 
MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL template 
(also attached).

It's taking for ever to process such a file, I understand I'm tokenizing on 
line breaks, which could be the source of the problem as there are many lines 
in the file (48198 exactly), but when running jvisualvm (see attached 
images/snapshot)I can see the writing op is invoked 20386 times, which seem not 
related to the line count. Is there an output buffer size that I can configure? 
Or something like that?

This is the route:
<camel:route id="pager" autoStartup="true">
<camel:from
uri="file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}<file:///\\tmp\in?charset=Windows-1252&amp;move=$%7bfile:parent%7d/../paged/$%7bfile:name.noext%7d.paged.ack&amp;preMove=$%7bfile:name.noext%7d-$%7bdate:now:yyyyMMddHHmmssSSS%7d.$%7bfile:ext%7d>"
 />
<camel:split streaming="true" parallelProcessing="false">
<camel:tokenize token="\n" />
<camel:to uri="bean:pager" />
<camel:to
uri="file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append<file:///\\tmp\paged?charset=utf8&amp;fileName=$%7bfile:name.noext%7d.paged&amp;fileExist=Append>"
 />
</camel:split>
</camel:route>

This is the referenced bean:

<bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
<property name="xsltPath"
value="/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
 />
<property name="param" value="C.*PAG.* 1" />
</bean>

Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think isn't 
a platform dependent problem, but a configuration one.

Any ideas? Any thing else that I should send?

Thanks!

Gonzalo Vásquez Sáez
Gerente Investigación y Desarrollo (R&D)
Altiuz Soluciones Tecnológicas de Negocios Ltda.
Av. Nueva Tajamar 555 Of. 802, Las Condes
(56-2) 335 2461
gvasquez@altiuz.c<mailto:gco...@altiuz.com>l
http://www.altiuz.cl<http://www.altiuz.cl/>


[cid:image001.jpg@01CDBEC2.D8261640]
[cid:image002.png@01CDBEC2.D8261640]
[cid:image003.png@01CDBEC2.D8261640]
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful.

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful.

Reply via email to