Re: Best Strategy to process a large number of rows in File

Jens Breitenstein Fri, 15 Apr 2016 06:26:23 -0700

Hi Michele

Reading a CSV with 40k lines using camel in streaming takes a viewseconds. As you limit the queue-size to avoid OOM the entire performancedepends how fast you can empty the queue.How long does processing of ONE message take in average? To me it lookslike approximately 1.6 secs (35000/6/60/60). The processes responsiblefor reading the queue is single-threaded??


Jens


Am 15/04/16 um 14:59 schrieb Michele:

Hi,

I spent a bit of time reading different topics on this issue, and I changed
my route like this reducing the memory usage of about 300Mb:

<route id="FileRetriever_Route">
                        <from
uri="{{uri.inbound}}?scheduler=quartz2&amp;scheduler.cron={{poll.consumer.scheduler}}&amp;scheduler.triggerId=FileRetriever&amp;scheduler.triggerGroup=IF_CBIKIT{{uri.inbound.options}}"
/>
                        <setHeader
headerName="ImportDateTime"><simple>${date:now:yyyyMMdd-HHmmss}</simple></setHeader>
                        <setHeader
headerName="MsgCorrelationId"><simple>CBIKIT_INBOUND_${in.header.ImportDateTime}</simple></setHeader>
                        <setHeader headerName="breadcrumbId">
                
<simple>Import-${in.header.CamelFileName}-${in.header.ImportDateTime}-${in.header.breadcrumbId}</simple>
                </setHeader>
                        <to uri="seda:processAndStoreInQueue" />
                        <log message="END - FileRetriever_Route" />
                </route>
                
                <route id="ProcessAndStoreInQueue_Route">
                        <from uri="seda:processAndStoreInQueue" />
                        <unmarshal>
                                <bindy type="Csv"
classType="com.fincons.ingenico.crt2.cbikit.inbound.model.RowData"/>
                        </unmarshal>
                        
                        <split streaming="true" 
executorServiceRef="myThreadPoolExecutor" >
                                <simple>${body}</simple>
                                <choice>
                                        <when>
                                                <simple></simple>
                                                <setHeader
headerName="CamelSplitIndex"><simple>${in.header.CamelSplitIndex}</simple></setHeader>
                                                <process 
ref="BodyEnricherProcessor" />
                                                <to
uri="dozer:transform?mappingFile=file:{{crt2.apps.home}}{{dozer.mapping.path}}&amp;targetModel=com.fincons.ingenico.crt2.cbikit.inbound.model.SerialNumber"
/>
                                                <marshal ref="Gson" />
                                                <to uri="activemq:queue:CBIKIT"  
/>     
                                        </when>
                                        <otherwise>
                                                <log message="Message discarded 
${in.header.CamelSplitIndex} -
${body}" />
                                        </otherwise>
                                </choice>
                        </split>
                </route>

The last test processed 35000 lines of CSV file in about 6h with an average
memory usage 1400Mb successful. But, Can I improve further processing
performance?

In addition, I noticed that Queue Size of Queue is low. Why? (Producer is
slower than Consumer?)

Thanks in advance.

Best Regards

Michele



--
View this message in context: 
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5781168.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Best Strategy to process a large number of rows in File

Reply via email to