Re: Best Strategy to process a large number of rows in File

Jens Breitenstein Tue, 29 Mar 2016 12:12:51 -0700

Sorry Ranx, missed your previous post.

From our experience: We ran into OOM troubles with CVS files whenconverting a row to Map<String, String> for example.Depending on the number of columns (we have aprox 100+) this can quiteeasily eat up the entire memory (sure you can always provide more), butmaps are not really memory friendly.And streaming does not necessarily imply memory is immediately freedafter the row has been read, camel needs to acknowledge a message(containing a unmarshalled row) has been processed.We are using streaming of csv's parallelized, with SEDA and a limitedamount of rows in mem (less than 100) and everything works fast withreally low mem consumption.

And to be honest: we have the CSV row as string, the unmarshalled camelmessage (maybe using Map<String, String>) and the serialized JMSrepresentation in memory at the same time for each csv-row, data isalready duplicated, no way to prevent this.

There is one thing left which puzzles me: what is the meaning of "Butnow, I have an error java.lang.OutOfMemoryError: Java heap space onActiveMQ"? This is just where the OOM happens,but not necessarily the root cause. Maybe JConsole will be of help totrack down memory usage and number of objects next


Jens







Am 29/03/16 um 20:11 schrieb Ranx:

Jens,

That's why I suggested setting the limit on the queue size.  She has
streaming turned on already so I believe that will block when the queue
(SEDA or JMS) gets full.  But 50,000 objects isn't usually that much memory
so there may be something else in the JMS settings that is actually
marshalling and unmarshalling all the data. It may not be a local in memory
queue.  That would end up duplicating everything.  And it appears that the
out of memory is associated with AMQ itself.  Without digging into that can
of worms immediately it is easy enough to switch over to SEDA just to get
the functionality working and throughput parameters established and then
take a look at what might be happening with JMS.

The SEDA queue has a "blockWhenFull" setting that should make that
relatively easy.  It's hard to say whether the overhead of JMS is necessary
in this case without knowing the actual business case and transactional
integrity requirements.



--
View this message in context: 
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5779989.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Best Strategy to process a large number of rows in File

Reply via email to