> System.setProperty("jdk.xml.entityExpansionLimit", "0");

https://docs.oracle.com/javase/tutorial/jaxp/limits/limits.html
"A value less than or equal to 0 indicates no limit."

    Andy

On 16/11/2025 16:44, Martynas Jusevičius wrote:
Hi,

I want to protect my RDF/XML I/O code against Billion laughs, external
DTD and similar exploits. Using Jena 4.7.0.

The reader code looks like this:

public Model read(Model model, InputStream is, Lang lang, String
baseURI, ErrorHandler errorHandler)
{
     RDFParser parser = RDFParser.create().
         lang(lang).
         errorHandler(errorHandler).
         checking(true). // otherwise exceptions will not be thrown for
invalid URIs!
         base(baseURI).
         source(is).
         build();

     parser.parse(StreamRDFLib.graph(model.getGraph()));

     return model;
}

I have a script that submits RDF/XML with Billion laughs (recursive
entity expansion) and that causes the Java application to run out of
memory:

Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:740)
at java.base/java.lang.StringBuffer.append(StringBuffer.java:410)
at 
org.apache.jena.rdfxml.xmlinput.states.AbsWantLiteralValueOrDescription.characters(AbsWantLiteralValueOrDescription.java:62)
at 
org.apache.jena.rdfxml.xmlinput.states.WantLiteralValueOrDescription.characters(WantLiteralValueOrDescription.java:77)
at 
org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.characters(XMLHandler.java:137)
at org.apache.xerces.parsers.AbstractSAXParser.characters(Unknown Source)
at org.apache.xerces.impl.dtd.XMLDTDValidator.characters(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:96)
at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118)
at org.apache.jena.riot.lang.ReaderRIOTRDFXML.parse(ReaderRIOTRDFXML.java:186)
at org.apache.jena.riot.lang.ReaderRIOTRDFXML.read(ReaderRIOTRDFXML.java:84)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:416)
at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:406)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:356)
at com.atomgraph.core.io.ModelProvider.read(ModelProvider.java:113)
at com.atomgraph.core.io.ModelProvider.read(ModelProvider.java:96)
at com.atomgraph.core.io.ModelProvider.readFrom(ModelProvider.java:90)
at com.atomgraph.core.io.ModelProvider.readFrom(ModelProvider.java:53)
at 
org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$TerminalReaderInterceptor.invokeReadFrom(ReaderInterceptorExecutor.java:233)
at 
org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$TerminalReaderInterceptor.aroundReadFrom(ReaderInterceptorExecutor.java:212)
at 
org.glassfish.jersey.message.internal.ReaderInterceptorExecutor.proceed(ReaderInterceptorExecutor.java:132)
at 
org.glassfish.jersey.message.internal.MessageBodyFactory.readFrom(MessageBodyFactory.java:1072)

I have tried the following config (and its alternative in
CATALINA_OPTS), but they do not seem to have any effect -- the exploit
still works:

System.setProperty("javax.xml.stream.isSupportingExternalEntities", "false");
System.setProperty("javax.xml.accessExternalDTD", "");
System.setProperty("javax.xml.accessExternalSchema", "");
System.setProperty("jdk.xml.entityExpansionLimit", "0");

What is the solution here?
I would hate to have to single out RDF/XML and handle it specially,
but I'll do it if it's necessary in order to solve this.

Thanks.

Martynas
atomgraph.com

Reply via email to