Hi, I think there are two distinct types of security vulnerabilities that we are talking about here. One is called something like "Zip bomb", the other "XML bomb". Both try to get you to open a malicious file which causes a huge expansion in memory and thus causes out-of-memory in your application and through this a denial-of-service. One attacks at the zip-file level, i.e. when uncompressing the ooxml-file during reading. The other on the XML-content level which resides inside the ZIP-file.
An "XML Bomb" is a file which uses various XML-functionality to cause a small file to expand to a much larger file in memory. Typically multiple leves of entity-expansion are used to cause this. Apache POI protects against this by disabling features in XML Parsers to not allow such expansion to take place at all. However you are actually looking at the code which protects Apache POI against a "Zip Bomb", i.e. when a ZIP-file is created in a way which expands to a much larger amount of memory when uncompressed. This is probably done via writing lots of similar data which compresses very well, however I did not look into details of this yet. While extracting the zip file (all ooxml file types are actually compressed zip-files), Apache POI counts compressed bytes and resulting uncompressed bytes. If the ratio is lowe than a given threshold (i.e. the compressed data expands a lot), it stops processing the file with an error to avoid this type of attack. If the file in question is produced by yourself, it is probably safe to lower the threshold via the API somewhat. If it is an external file from an untrusted source, you likely don't want to process the file, only a close look at the actual ZIP-data will allow to say for sure. Dominik On Mon, Mar 25, 2019, 15:26 Scott Gardner <[email protected]> wrote: > I understand that, but specifically what is it in a .zip file that will > cause this if statement to throw the IllegalStateException? I don't > understand where the values of text.length() and string.length() are coming > from. > int size = text.length() + string.length(); > if(size > ZipSecureFile.getMaxTextSize()) { > I'm getting this exception and I don't know what (in the .zip file) is > causing this to be thrown. > The text would exceed the max allowed overall size of extracted text. By > default this is prevented as some documents may exhaust available memory > and it may indicate that the file is used to inflate memory usage and thus > could pose a security risk. You can adjust this limit via > ZipSecureFile.setMaxTextSize() if you need to work with files which have a > lot of text. Size: 10485785, limit: MAX_TEXT_SIZE: 10485760 > > On 2019/03/22 18:39:06, Scott Gardner <[email protected]> wrote: > > Can someone explain what causes IllegalStateException to be thrown in> > > POIXMLTextExtractor.java?> > > > > In the file org/apache/poi/POIXMLTextExtractor.java is this if > statement> > > > > if(size > ZipSecureFile.getMaxTextSize()) {> > > throw new IllegalStateException("The text would exceed the max> > > allowed overall size of extracted text. "> > > + "By default this is prevented as some documents may exhaust> > > available memory and it may indicate that the file is used to inflate> > > memory usage and thus could pose a security risk. "> > > + "You can adjust this limit via ZipSecureFile.setMaxTextSize() > if> > > you need to work with files which have a lot of text. "> > > + "Size: " + size + ", limit: MAX_TEXT_SIZE: " +> > > ZipSecureFile.getMaxTextSize());> > > }> > > > > Can someone tell me exactly what causes this message to be printed? What> > > does "The text" mean in the context of that message?> > > Can someone give me a .zip file that will cause this message to appear > and> > > explain to me what it is about the contents of the .zip file> > > causes that message to be printed?> > > >
