Exactly like the example shown in Chapter 10 of Camel in Action, I need to
process a large csv file (~4m lines).

Without parallel processing it takes about 30 seconds to process the file. I
had good hope when I discovered the parallel processing feature in Camel.
However it doesn't not improve the processing time at all (sometime it's
even worse). I'd like to know whether I'm doing something stupid in Camel or
if it is a problem in my code.

Here is my route:

ExecutorService threadPool = Executors.newFixedThreadPool(10);
String token = "\r\n";
int splitSize = 1000;

from("file:myBigFile.csv").
routeId("route4").
split().tokenize(token, splitSize).streaming().executorService(threadPool).
process(myProcessor). //myProcessor is a custom processor that create object
from a csv line and processes it accordingly
filter().header("CamelSplitComplete").//on the last line
process(new RouteStarterProcessor(context, "route5")).end();//start the next
route to process the next file

I'm processing several files in sequential order. When I know I'm processing
the last line, I use a custom processor (RouteStarterProcessor) to start the
route processing the next file.

When I profile my application, I can see the 10 threads of the pool but they
are doing very little work (running 5%~10% out of the 30s of processing).
However the Camel thread for this route is running 100% of the total
processing time.

Looking at the profiler, a lot of time is spent on the 
org.apache.camel.processor.MulticastProcessor$AggregateOnTheFlyTask.aggregateOnTheFly()
method.

The code behind MyProcessor should be able to concurrently process the data
in a fairly efficiently way (mainly adding object to Map and collections and
use of Concurrent Collections rather than synchronised locks).

It could be an issue in my code but I'd like to know if someone can spot a
mistake in my camel route. Am I doing something that implicitly creates a
thread barrier and prevent the tasks from being executed concurrently?

Thanks




--
View this message in context: 
http://camel.465427.n5.nabble.com/Parallel-processing-of-big-file-tp5737386.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to