Hi Would it be possible for you to put this into a .zip file as a little ready to run example? Then the Camel team and others can use that to investigate.
In your use-case we may be able to optimize the AggregateOnTheFlyTask when using the splitter. On Fri, Aug 16, 2013 at 11:25 AM, cesar.tl <cesar.tron-lo...@lombardrisk.com> wrote: > Exactly like the example shown in Chapter 10 of Camel in Action, I need to > process a large csv file (~4m lines). > > Without parallel processing it takes about 30 seconds to process the file. I > had good hope when I discovered the parallel processing feature in Camel. > However it doesn't not improve the processing time at all (sometime it's > even worse). I'd like to know whether I'm doing something stupid in Camel or > if it is a problem in my code. > > Here is my route: > > ExecutorService threadPool = Executors.newFixedThreadPool(10); > String token = "\r\n"; > int splitSize = 1000; > > from("file:myBigFile.csv"). > routeId("route4"). > split().tokenize(token, splitSize).streaming().executorService(threadPool). > process(myProcessor). //myProcessor is a custom processor that create object > from a csv line and processes it accordingly > filter().header("CamelSplitComplete").//on the last line > process(new RouteStarterProcessor(context, "route5")).end();//start the next > route to process the next file > > I'm processing several files in sequential order. When I know I'm processing > the last line, I use a custom processor (RouteStarterProcessor) to start the > route processing the next file. > > When I profile my application, I can see the 10 threads of the pool but they > are doing very little work (running 5%~10% out of the 30s of processing). > However the Camel thread for this route is running 100% of the total > processing time. > > Looking at the profiler, a lot of time is spent on the > org.apache.camel.processor.MulticastProcessor$AggregateOnTheFlyTask.aggregateOnTheFly() > method. > > The code behind MyProcessor should be able to concurrently process the data > in a fairly efficiently way (mainly adding object to Map and collections and > use of Concurrent Collections rather than synchronised locks). > > It could be an issue in my code but I'd like to know if someone can spot a > mistake in my camel route. Am I doing something that implicitly creates a > thread barrier and prevent the tasks from being executed concurrently? > > Thanks > > > > > -- > View this message in context: > http://camel.465427.n5.nabble.com/Parallel-processing-of-big-file-tp5737386.html > Sent from the Camel - Users mailing list archive at Nabble.com. -- Claus Ibsen ----------------- Red Hat, Inc. Email: cib...@redhat.com Twitter: davsclaus Blog: http://davsclaus.com Author of Camel in Action: http://www.manning.com/ibsen