I think you're hitting a lot of good points there. I'm not used to CSVs with 100 columns of data but can see why that could get huge.
If she starts with SEDA and just sets a queue size of something like 100 or 200 and then sets blockWhenFull to true her streaming will halt until the queue can take more. So she won't have all the file in memory that way. She didn't post what the JMS queue configuration looks like so it's hard to tell what caused the OOM. Personally that's why I like to start with the simple SEDA even if I know I'm not going to end up there. Everything is in memory so there shouldn't be replication. Threading is fairly simple and she has a bottleneck at her REST endpoint. If all the marshalling and data conversion is done before the objects are put on the queue then she has some nice granularity to be able to determine exactly what her throughput at that point is. In other words, figuring out how many threads she needs to communicate with the REST endpoint doesn't have confounding factors like the threads doing processing and GSon conversion. It is just about how fast can I take a GSon model off the queue, invoke the REST call and get the data back. -- View this message in context: http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5779994.html Sent from the Camel - Users mailing list archive at Nabble.com.