Sorry Ranx, missed your previous post.
From our experience: We ran into OOM troubles with CVS files when
converting a row to Map<String, String> for example.
Depending on the number of columns (we have aprox 100+) this can quite
easily eat up the entire memory (sure you can always provide more), but
maps are not really memory friendly.
And streaming does not necessarily imply memory is immediately freed
after the row has been read, camel needs to acknowledge a message
(containing a unmarshalled row) has been processed.
We are using streaming of csv's parallelized, with SEDA and a limited
amount of rows in mem (less than 100) and everything works fast with
really low mem consumption.
And to be honest: we have the CSV row as string, the unmarshalled camel
message (maybe using Map<String, String>) and the serialized JMS
representation in memory at the same time for each csv-row, data is
already duplicated, no way to prevent this.
There is one thing left which puzzles me: what is the meaning of "But
now, I have an error java.lang.OutOfMemoryError: Java heap space on
ActiveMQ"? This is just where the OOM happens,
but not necessarily the root cause. Maybe JConsole will be of help to
track down memory usage and number of objects next
Jens
Am 29/03/16 um 20:11 schrieb Ranx:
Jens,
That's why I suggested setting the limit on the queue size. She has
streaming turned on already so I believe that will block when the queue
(SEDA or JMS) gets full. But 50,000 objects isn't usually that much memory
so there may be something else in the JMS settings that is actually
marshalling and unmarshalling all the data. It may not be a local in memory
queue. That would end up duplicating everything. And it appears that the
out of memory is associated with AMQ itself. Without digging into that can
of worms immediately it is easy enough to switch over to SEDA just to get
the functionality working and throughput parameters established and then
take a look at what might be happening with JMS.
The SEDA queue has a "blockWhenFull" setting that should make that
relatively easy. It's hard to say whether the overhead of JMS is necessary
in this case without knowing the actual business case and transactional
integrity requirements.
--
View this message in context:
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5779989.html
Sent from the Camel - Users mailing list archive at Nabble.com.