Hi Michele

Reading a CSV with 40k lines using camel in streaming takes a view seconds. As you limit the queue-size to avoid OOM the entire performance depends how fast you can empty the queue. How long does processing of ONE message take in average? To me it looks like approximately 1.6 secs (35000/6/60/60). The processes responsible for reading the queue is single-threaded??

Jens


Am 15/04/16 um 14:59 schrieb Michele:
Hi,

I spent a bit of time reading different topics on this issue, and I changed
my route like this reducing the memory usage of about 300Mb:

<route id="FileRetriever_Route">
                        <from
uri="{{uri.inbound}}?scheduler=quartz2&amp;scheduler.cron={{poll.consumer.scheduler}}&amp;scheduler.triggerId=FileRetriever&amp;scheduler.triggerGroup=IF_CBIKIT{{uri.inbound.options}}"
/>
                        <setHeader
headerName="ImportDateTime"><simple>${date:now:yyyyMMdd-HHmmss}</simple></setHeader>
                        <setHeader
headerName="MsgCorrelationId"><simple>CBIKIT_INBOUND_${in.header.ImportDateTime}</simple></setHeader>
                        <setHeader headerName="breadcrumbId">
                
<simple>Import-${in.header.CamelFileName}-${in.header.ImportDateTime}-${in.header.breadcrumbId}</simple>
                </setHeader>
                        <to uri="seda:processAndStoreInQueue" />
                        <log message="END - FileRetriever_Route" />
                </route>
                
                <route id="ProcessAndStoreInQueue_Route">
                        <from uri="seda:processAndStoreInQueue" />
                        <unmarshal>
                                <bindy type="Csv"
classType="com.fincons.ingenico.crt2.cbikit.inbound.model.RowData"/>
                        </unmarshal>
                        
                        <split streaming="true" 
executorServiceRef="myThreadPoolExecutor" >
                                <simple>${body}</simple>
                                <choice>
                                        <when>
                                                <simple></simple>
                                                <setHeader
headerName="CamelSplitIndex"><simple>${in.header.CamelSplitIndex}</simple></setHeader>
                                                <process 
ref="BodyEnricherProcessor" />
                                                <to
uri="dozer:transform?mappingFile=file:{{crt2.apps.home}}{{dozer.mapping.path}}&amp;targetModel=com.fincons.ingenico.crt2.cbikit.inbound.model.SerialNumber"
/>
                                                <marshal ref="Gson" />
                                                <to uri="activemq:queue:CBIKIT"  
/>     
                                        </when>
                                        <otherwise>
                                                <log message="Message discarded 
${in.header.CamelSplitIndex} -
${body}" />
                                        </otherwise>
                                </choice>
                        </split>
                </route>

The last test processed 35000 lines of CSV file in about 6h with an average
memory usage 1400Mb successful. But, Can I improve further processing
performance?

In addition, I noticed that Queue Size of Queue is low. Why? (Producer is
slower than Consumer?)

Thanks in advance.

Best Regards

Michele



--
View this message in context: 
http://camel.465427.n5.nabble.com/Best-Strategy-to-process-a-large-number-of-rows-in-File-tp5779856p5781168.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to