I will try to provide the steps that are in the current version:
1. read one line from the file, set it as the outbound message's body of an
exchange, and, according to the file type, send the exchange to an activemq
queue
2. the exchange will arrive on another service unit that has a processor
which creates an input stream from that line and sends it to an xml mapper
generated using Altova MapForce 2011 (as I mentioned before, I didn't choose
the mapper but they say it's extremely fast). This mapper returns a
ByteArrayOutputStream output containing an xml string that represents the
mapping of some values from the read line to actual xml fields. As a basic
example, number 200 from the line will be mapped to
<InitialAmount>200</InitialAmount>. The xml gets set as the outbound
message's body of the exchange and the exchange is being sent to another
queue
3. when the exchange with the xml is received it gets sent to another
processor, with another generated mapper, that maps this xml to another xml,
for example <InitialAmount>200</InitialAmount> to <Quantity>200</Quantity>.
This is just a simple example but the mapping can be more complex. The final
xml string is set as the outbound message's body of the exchange and the
exchange is being sent to the final service unit.
4. the final service unit picks up the exchange, unmarshals it's body into
an actual db value object and inserts that object into the db

When I get the OOME I actually append each ByteArrayOutputStream output's
toString() to a StringBuilder. I have to do this because I get 500 lines
from the file, I map each of them into an xml in a while loop and I have no
idea how to send each xml into an exchange so I append everything and set
the final result to the exchange's outbound message body. If I could send
each xml after I map it, instead of appending it, and map another one inside
the same process method it would be perfect, it would be the answer to my
problem.

I want to implement batch inserts/updates on the db to increase the
performance and I also want to read hundreds of lines from the text file but
at a certain point send the mapped xml in exchanges one by one, not all of
them at the same time. I think that I/O operations take a lot of time, just
like in "Parsing large Files with Apache Camel" from catify.com where he
raised the number of read lines per second from 200 to 4000 by reading in
batches instead of per line.



--
View this message in context: 
http://camel.465427.n5.nabble.com/Large-file-processing-with-Apache-Camel-tp5727977p5728001.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to