Darren Foong created GEODE-3306:
-----------------------------------

             Summary: Parsing of cache.xml with whitespace fails with Apache 
Xerces
                 Key: GEODE-3306
                 URL: https://issues.apache.org/jira/browse/GEODE-3306
             Project: Geode
          Issue Type: Bug
          Components: core
            Reporter: Darren Foong
             Fix For: 1.2.0


I am using Geode 1.2.0 and Apache Xerces 2.11.0 (not the one included in the 
Oracle JDK), and I encountered the following error when I tried to 
programmatically start a cache:

{noformat}
org.apache.geode.InternalGemFireError: Did not expected a 
java.lang.StringBuffer on top of the stack.

Exception in thread "main" org.apache.geode.InternalGemFireError: Did not 
expected a java.lang.StringBuffer on top of the stack.
        at org.apache.geode.internal.Assert.throwError(Assert.java:94)
        at org.apache.geode.internal.Assert.assertTrue(Assert.java:117)
        at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endRegionAttributes(CacheXmlParser.java:1257)
        at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.endElement(CacheXmlParser.java:2909)
        at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser$DefaultHandlerDelegate.endElement(CacheXmlParser.java:3374)
        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown 
Source)
        at 
org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
        at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown 
Source)
        at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 Source)
        at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
Source)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at 
org.apache.geode.internal.cache.xmlcache.CacheXmlParser.parse(CacheXmlParser.java:224)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4287)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1390)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1195)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:758)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:745)
        at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:173)
        at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:212)
        at server.ServerWhitespace.main(ServerWhitespace.java:8)
{noformat}

However, this does not happen when I don't use Apache Xerces, i.e. I rely on 
the version in the Oracle JDK (1.8).

After getting the Geode source code and stepping through the parsing using the 
Eclipse debugger, I realised that there were unexpected StringBuffers pushed 
onto the parse stack, thus causing the problem.

These StringBuffers were created and pushed by the {{characters()}} method 
(https://github.com/apache/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheXmlParser.java#L3270).
 Changing the log level to {{TRACE}} and examining the parse stack showed that 
these StringBuffers contained the whitespace (including newlines) between the 
XML tags in {{cache.xml}}.

When using the Oracle JDK's version of Xerces, these StringBuffers did not 
appear on the parse stack despite the whitespace.

I have a proof of concept on GitHub: 
https://github.com/darrenfoong/geode-parser-poc The {{cache.xml}} file without 
whitespace between the tags was parsed without errors by both versions of 
Xerces.

It could be the case that the JDK Xerces strips out whitespace while Apache 
Xerces doesn't; but this could be implemented in {{characters()}} by only 
pushing non-whitespace char arrays in the {{else}} block. However, there could 
be other XML parsing edge cases that I am unaware of.

There should be others who need Apache Xerces for their projects; a fix would 
be appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to