Hi

Would it be possible for you to put this into a .zip file as a little
ready to run example? Then the Camel team and others can use that to
investigate.

In your use-case we may be able to optimize the AggregateOnTheFlyTask
when using the splitter.

On Fri, Aug 16, 2013 at 11:25 AM, cesar.tl
<cesar.tron-lo...@lombardrisk.com> wrote:
> Exactly like the example shown in Chapter 10 of Camel in Action, I need to
> process a large csv file (~4m lines).
>
> Without parallel processing it takes about 30 seconds to process the file. I
> had good hope when I discovered the parallel processing feature in Camel.
> However it doesn't not improve the processing time at all (sometime it's
> even worse). I'd like to know whether I'm doing something stupid in Camel or
> if it is a problem in my code.
>
> Here is my route:
>
> ExecutorService threadPool = Executors.newFixedThreadPool(10);
> String token = "\r\n";
> int splitSize = 1000;
>
> from("file:myBigFile.csv").
> routeId("route4").
> split().tokenize(token, splitSize).streaming().executorService(threadPool).
> process(myProcessor). //myProcessor is a custom processor that create object
> from a csv line and processes it accordingly
> filter().header("CamelSplitComplete").//on the last line
> process(new RouteStarterProcessor(context, "route5")).end();//start the next
> route to process the next file
>
> I'm processing several files in sequential order. When I know I'm processing
> the last line, I use a custom processor (RouteStarterProcessor) to start the
> route processing the next file.
>
> When I profile my application, I can see the 10 threads of the pool but they
> are doing very little work (running 5%~10% out of the 30s of processing).
> However the Camel thread for this route is running 100% of the total
> processing time.
>
> Looking at the profiler, a lot of time is spent on the
> org.apache.camel.processor.MulticastProcessor$AggregateOnTheFlyTask.aggregateOnTheFly()
> method.
>
> The code behind MyProcessor should be able to concurrently process the data
> in a fairly efficiently way (mainly adding object to Map and collections and
> use of Concurrent Collections rather than synchronised locks).
>
> It could be an issue in my code but I'd like to know if someone can spot a
> mistake in my camel route. Am I doing something that implicitly creates a
> thread barrier and prevent the tasks from being executed concurrently?
>
> Thanks
>
>
>
>
> --
> View this message in context: 
> http://camel.465427.n5.nabble.com/Parallel-processing-of-big-file-tp5737386.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
Email: cib...@redhat.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Reply via email to