On Mon, Jun 12, 2023 at 2:14 PM Tim Allison <talli...@apache.org> wrote:

> All,
>
>   Since the refactoring to avoid threadlocal in skipFully, we are now
> getting errors in one of our unit tests on Tika [0].  We're still on
> commons-io 2.11.0, but we found this problem with 2.12.0 and 2.13.0.
>
>    I was able to reproduce the problem outside of Tika [1].  To reproduce
> the problem, the stream has to be wrapped in java's InflaterInputStream,
> and it has to be run multithreaded.  The problem is easily reproducible
> with only two threads, but there is variation on which iteration triggers
> the problem even with the same seed (what you'd expect from a
> multi-threading bug).
>
>   I don't think this is a bug in commons-io, per se, but it is mildly
> frightening from a data reliability perspective that the combination of
> IOUtils.skipFully on top of Java's InflaterInputStream can lead to
> different data.
>
>   Is this a bug in Java, a bug in commons-io or something else?  What
> should we do?  Thank you!
>

Interesting.  Taking a quick look at the source for both IOUtils and
InflaterInputStream, it looks to me like the InflaterInputStream may be
effectively assuming that it has exclusive access to the buffer that it is
reading into.   I looked at a couple of jdk sources and they both look like
this:

public int read(byte[] b, int off, int len) throws IOException {
        ensureOpen();
        if (b == null) {
            throw new NullPointerException();
        } else if (off < 0 || len < 0 || len > b.length - off) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return 0;
        }
        try {
            int n;
            while ((n = inf.inflate(b, off, len)) == 0) {
                if (inf.finished() || inf.needsDictionary()) {
                    reachEOF = true;
                    return -1;
                }
                if (inf.needsInput()) {
                    fill();
                }
            }
            return n;
        } catch (DataFormatException e) {
            String s = e.getMessage();
            throw new ZipException(s != null ? s : "Invalid ZLIB data
format");
        }
    }

I can see how zero-ing the byte array it is reading to could cause problems
here.  I don't see any doc anywhere saying that either this Reader or the
interface in general has this constraint, so maybe it is a jdk bug, but it
is what it is and I think to be safe it would be better to add back the
ThreadLocal on the write only bufffer in IOUtills or change it to not zero
the array.  I don't see why the zeroing is necessary, but I don't know much
about [io].

Phil



>        Best,
>
>            Tim
>
>
>
>
> [0] https://issues.apache.org/jira/browse/TIKA-4065
> [1]
>
> https://github.com/tballison/commons-io/blob/TIKA-4065/src/test/java/org/apache/commons/io/IOUtilsMultithreadedTest.java
>

Reply via email to