On Mon, Jun 12, 2023 at 6:52 PM Phil Steitz <phil.ste...@gmail.com> wrote:
> > > On Mon, Jun 12, 2023 at 2:14 PM Tim Allison <talli...@apache.org> wrote: > >> All, >> >> Since the refactoring to avoid threadlocal in skipFully, we are now >> getting errors in one of our unit tests on Tika [0]. We're still on >> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0. >> >> I was able to reproduce the problem outside of Tika [1]. To reproduce >> the problem, the stream has to be wrapped in java's InflaterInputStream, >> and it has to be run multithreaded. The problem is easily reproducible >> with only two threads, but there is variation on which iteration triggers >> the problem even with the same seed (what you'd expect from a >> multi-threading bug). >> >> I don't think this is a bug in commons-io, per se, but it is mildly >> frightening from a data reliability perspective that the combination of >> IOUtils.skipFully on top of Java's InflaterInputStream can lead to >> different data. >> >> Is this a bug in Java, a bug in commons-io or something else? What >> should we do? Thank you! >> > > Interesting. Taking a quick look at the source for both IOUtils and > InflaterInputStream, it looks to me like the InflaterInputStream may be > effectively assuming that it has exclusive access to the buffer that it is > reading into. I looked at a couple of jdk sources and they both look like > this: > > public int read(byte[] b, int off, int len) throws IOException { > ensureOpen(); > if (b == null) { > throw new NullPointerException(); > } else if (off < 0 || len < 0 || len > b.length - off) { > throw new IndexOutOfBoundsException(); > } else if (len == 0) { > return 0; > } > try { > int n; > while ((n = inf.inflate(b, off, len)) == 0) { > if (inf.finished() || inf.needsDictionary()) { > reachEOF = true; > return -1; > } > if (inf.needsInput()) { > fill(); > } > } > return n; > } catch (DataFormatException e) { > String s = e.getMessage(); > throw new ZipException(s != null ? s : "Invalid ZLIB data > format"); > } > } > > I can see how zero-ing the byte array it is reading to could cause > problems here. I don't see any doc anywhere saying that either this Reader > or the interface in general has this constraint, so maybe it is a jdk bug, > but it is what it is and I think to be safe it would be better to add back > the ThreadLocal on the write only bufffer in IOUtills or change it to not > zero the array. I don't see why the zeroing is necessary, but I don't know > much about [io]. > Sorry, just eliminating the zero-ing probably wont work because concurrent writes could make buffer segments not inflatable IIUC what is going on. So I would just restore the protection. > > Phil > > > >> Best, >> >> Tim >> >> >> >> >> [0] https://issues.apache.org/jira/browse/TIKA-4065 >> [1] >> >> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java >> >