On 2018-03-31, kiwiwings wrote: > Stefan Bodewig wrote >> ... then the compressed and uncompressed sizes of each >> ZipArchiveEntry are known before you try to read the stream. Can't you >> simply reject reading entries who's uncompressed size is too big?
> Are those sizes taken from the central directory? > Isn't it possible to tamper around those entries? > if so, would you rely on those? You are certainly correct. In that case wrapping the InputStream into a BoundedInputStream capped at the claimed uncompressed size would work. If the uncompressed size hasn't been tampered, then all is well, otherwise the stream will simply stop reading any further. > To prevent false positives with people who put an insane amount of > data into Excel sheets, I've chosen the ratio way, i.e. comparing how > many bytes of the compressed stream are read vs. the uncompressed > size. There's another limit check which does exactly what you've > explained, but this defaults to around 2GB. Although POI has anyway a > problem with the normal processing mode to deal with such big files, > there's also a SAX streaming mode which can take benefit of this. > I haven't dived into the compress code yet, but my hope is, that the ratio > logic can be also applied there. The CompressorInputStreams provide counts of the compressed bytes read, but at least the InputStream provided by ZipFile#getInputStream does not. So I'm afraid there currently is no easy way to tell how many compressed bytes have been processed at any point in time. > For a reference I've looked through the archives to find related discussions > [1], but maybe missed the private ones. I'm not aware of any private discussion. In this thread as well as COMPRESS-445 we are only talking about ZIP, but all the other compressing archive formats (and compression formats) can certainly also be used for "bombs". Compress' test resources contain tar.gz files of about 8MB size that expand to 8+GB - these are the tests for really big tars. What I'd really want to do is provide a solution that works for all our formats and not just for ZIP. > So the discussion were similar to this thread, i.e. about taking the > metadata/sizes from the central directory. True, and you are absolutely correct that this is short-sightened when somebody who crafts a malicious input may just put the wrong values in there. > So my approach would be to try to move to commons-compress, check if the > ratio logic can be added and if this is possible, we might also need to look > at the performance trade-off (if any?) and decide if we want a switchable > implementation ... The first thing you'd need is to get the information of how much of the uncompressed size has been read so far. Given an API of InputStream getInputStream(ZipArchiveEntry e) there isn't much room for providing that information apart from documenting "InputStream will be a FooInputStream" and have FooInputStream provide the information. An alternative may be a progress listener approach which would frequently provide information about the number of bytes processed. This has been discussed before (on this list a few days ago and in COMPRESS-207). I'm absolutely open to providing the means to implement the protection. The dev list might be the better place. Right now I'm not convinced it should be enabled by default, but we can discuss that once we get there. Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org