Ok thanks. 

But the outcome is very obvious when I analyse the heapdump. 
The pool.clear() is never executed and the memory is filled with the orphaned pool. 
If the pool is to be used as a placeholder for the expressions then why is not only 10 items in the pool? My pool after importing records are huge, 10 times the amount of records in the split I would say. 


/M




On fre, nov. 12, 2021 at 07:52, Claus Ibsen <claus.ib...@gmail.com> skrev:
Hi

Mind about evaluating xpath in Java is not thread safe so you end up having to create an instance of XPathExpression per xpath you want to execute.
And your bean have 10 xpaths, so that is 10 per message, so you end up with 10 XPathExpression instances in the memory.
And all legacy XML from the JDK/JVM is memory hungry (DOM, JAXB etc). 

And if you turn on parallel processing then you multiply this with another 10 or more depending on number of concurrent threads etc.
In this case you can make thar argument that the pool should be able to shrink in case there was a spike of concurrent processing which later is no longer needed,
then the pool have too many free elements.

Also as Alex mentions then a stax based parser may be better, which you use with tokenizeXML.

You talk about a leak, but is that really a leak? The xpath instance is pooled so it can be re-used for the next message. So what you see in the memory is those 10 XPathExpression instances.
If they are cleared after processing a message, then you end up having to re-create the XPathExpression for the next message, and then you have more CPU usage and also more pressure on the GC
to de-allocate those 10 XPathExpression per message.





On Mon, Nov 8, 2021 at 8:12 PM Mikael Andersson Wigander <mikael.andersson.wigan...@pm.me.invalid> wrote:
Hi

With the risk of being seen as a n00b (again)…

We are processing large XML files (0.5GB/~500.000 records).
To process them we use stream caching, spit, parallel processing, xpath and a bean.

We get a lot of OutOfMemoryExceptions and after analysing we see that the call to the bean method is the villain.

The process is to split() using tokenizeXML() on a tag that makes up one record in the XML.

For each of these records we call a bean where the method utilises @Xpath() on the method parameters.

We see in the heap dump that these calls are never GC'd, we have 90% leftovers
image.png

The question is: is this related to a not thread safe bean/method or what could be the reason?
The documentation states the default behaviour is a Signleton and when used in concurrent processing it must be thread safe…

Running as a war under Tomcat 9 on Windows using Camel 3.11.3 and Spring Boot 2.5.6.
Server has 32GB of RAM…

Route:
from(file("Full"))
                .streamCaching()
                .unmarshal()
                .zipFile()
                .split()
                .tokenizeXML("RefData")
                .streaming()
                .parallelProcessing(false)
                .bean(XmlToSqlBean.class)
                .to(jdbc("default"))
                .end();

Bean:
public class XmlToSqlBean {
            public String toSql(@XPath("//FinInstrmGnlAttrbts/Id") final String isin,
                                @XPath("//NtnlCcy") final String currency,
                                @XPath("//FullNm") final String fullName,
                                @XPath("//TradgVnRltdAttrbts/Id") final String venue,
                                @XPath("//ClssfctnTp") final String classification,
                                @XPath("//TradgVnRltdAttrbts/TermntnDt") final String terminationDate,
                                @XPath("//Issr") final String issuer,
                                @XPath("//MtrtyDt") String maturityDate,
                                @XPath("//TermntdRcrd") final String termnRecord,
                                @XPath("//NewRcrd") final String newRecord) {
                …
            }
        }


Thanks

/M



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Reply via email to