Sorry Ranx, missed your previous post.

From our experience: We ran into OOM troubles with CVS files when converting a row to Map<String, String> for example. Depending on the number of columns (we have aprox 100+) this can quite easily eat up the entire memory (sure you can always provide more), but maps are not really memory friendly. And streaming does not necessarily imply memory is immediately freed after the row has been read, camel needs to acknowledge a message (containing a unmarshalled row) has been processed. We are using streaming of csv's parallelized, with SEDA and a limited amount of rows in mem (less than 100) and everything works fast with really low mem consumption.

And to be honest: we have the CSV row as string, the unmarshalled camel message (maybe using Map<String, String>) and the serialized JMS representation in memory at the same time for each csv-row, data is already duplicated, no way to prevent this.

There is one thing left which puzzles me: what is the meaning of "But now, I have an error java.lang.OutOfMemoryError: Java heap space on ActiveMQ"? This is just where the OOM happens, but not necessarily the root cause. Maybe JConsole will be of help to track down memory usage and number of objects next

Jens







Am 29/03/16 um 20:11 schrieb Ranx:
Jens,

That's why I suggested setting the limit on the queue size.  She has
streaming turned on already so I believe that will block when the queue
(SEDA or JMS) gets full.  But 50,000 objects isn't usually that much memory
so there may be something else in the JMS settings that is actually
marshalling and unmarshalling all the data. It may not be a local in memory
queue.  That would end up duplicating everything.  And it appears that the
out of memory is associated with AMQ itself.  Without digging into that can
of worms immediately it is easy enough to switch over to SEDA just to get
the functionality working and throughput parameters established and then
take a look at what might be happening with JMS.

The SEDA queue has a "blockWhenFull" setting that should make that
relatively easy.  It's hard to say whether the overhead of JMS is necessary
in this case without knowing the actual business case and transactional
integrity requirements.



--
View this message in context: 
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5779989.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to