I think you're hitting a lot of good points there.  I'm not used to CSVs with
100 columns of data but can see why that could get huge.  

If she starts with SEDA and just sets a queue size of something like 100 or
200 and then sets blockWhenFull to true her streaming will halt until the
queue can take more.  So she won't have all the file in memory that way.

She didn't post what the JMS queue configuration looks like so it's hard to
tell what caused the OOM. Personally that's why I like to start with the
simple SEDA even if I know I'm not going to end up there.  Everything is in
memory so there shouldn't be replication. Threading is fairly simple and she
has a bottleneck at her REST endpoint.  If all the marshalling and data
conversion is done before the objects are put on the queue then she has some
nice granularity to be able to determine exactly what her throughput at that
point is.  In other words, figuring out how many threads she needs to
communicate with the REST endpoint doesn't have confounding factors like the
threads doing processing and GSon conversion.  It is just about how fast can
I take a GSon model off the queue, invoke the REST call and get the data
back.



--
View this message in context: 
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5779994.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to