-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 All,
In the past week, I've received reports of our servers starting to incorrectly escape XML strings with consumer errors like this: org.xml.sax.SAXParseException: The entity "rsquo" was referenced, but not declared. When looking at the raw text being generated, it's clear that, indeed, the text is being escaped as if it were HTML (where the ’ entity is defined) instead of XML. The code path is a little convoluted, and I'm going to try to get the smallest reproducible test case I can, but I thought I'd reach-out early to see if anyone has any "aha" guidance to me before I tear-out a whole lot of hair following this down the rabbit hole. This is commons-text-1.1. I've looked at the release notes between 1.1 and 1.8 and I don't see anything immediately that looks like a bugfix. The data is coming from a database, and the string is clearly correct, and it includes a "typographic right apostrophe", which is accurately ’ in HTML. The output is being generated by Apache Velocity, through a macro which escapes XML for us. The code in the template looks like this: #xmlEscape($foo) Where $foo is a string value containing this character: ’ The xmlEscape macro is defined in our global macros file which gets evaluated on startup: #macro (xmlEscape $text)#if($text)$!modernEscape.escapeXml10($text.toString())#end#end $modernEscape is an instance of org.apache.commons.text.StringEscapeUtils in the global-scope; it's like "application" scope for webapps, but it's in Velocity. When we first start our web application, all seems well. After some time, this process breaks and we start emitting "’" instead of "’" . I can find no evidence of any of the following: 1. multiple versions of commons-text library 2. multiple versions org.apache.commons.text.StringEscapeUtil in any library 3. any component replacing the value of $modernEscape 4. any component replacing the definition of the #xmlEscape macro When the first report came in, we tried replicating the reporter's experience and we could see it on one server node but not others. We restarted that web application on that node and it started working properly again. Does StringEscapeUtils.escape* keep any state associated with what it's doing? We aren't doing anything weird: just calling StringEscapeUtils.escapeXml10 ... a lot of times, probably from many threads. Any ideas? - -chris -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl4grhwACgkQHPApP6U8 pFga3RAAgPalqagLkEyGuWhKOaa6VbGaXRqLGNjd63byTM/TFKJyuVHsU3W0MpkC LxG7IK1a+FuTcQuaxSY8tP9T/TH7p88y9cVpj2r8b4PXJLZ4SddOMxr/gT9MfBxA 7Vq+vpvwdkWOfcIqFBwgcx7h+EVGoUbzzYBbc301m5TxkK7kYtV6KmlGi4o3R68A x5Ic6QtASxjaDZK6bywsHTxQWmp66+8j1QFInEtjP69Am+fkjKxE/vnTHFYha+Cr rYuseQxhDMOyUOxhPQiU65sFzjGnS/0529EV0VykP59YNrpTGAxha7T5tSQL8iNy p9fRv0X/Ijz6WznNiN6K36Ftu6OEyTouak0zfzKiOPZKhIvp+ofNaRbuA01O/Km/ hqt0bEdBtq8/nnYGsKmXuNv+18pWl8eY539w3kw572Rnzyxo5bdUX5YFCyq3dIeP rhQDhA4DDpFfaHHsL1cIdLXs5b+0au85REwHusZe7iPCxZytUNahE9uDIcQhyRwJ ix6+LgF+4nWHVtMnQL3Dw60Of/uIbvEs/Bfvc86dIGrEBhXoh2q1qLu1iwlBf7Jw rxFsWmDv8T1jrWYmvKNispr2KUAhGf6bl+1PxxxdnKnUJdE09CqjDL/BnYclDqJZ 6f7pORqISRLiUN99KHNliC9TMwEBjmXUhV3QOoSx+d5IUTBB0/g= =zk4m -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
