I have also encountered the same issue in a simple test that tries to identify an "application/vnd.google-earth.kmz" file. I can work around the "invalid mark" problem by wrapping the InputStream used in Tike.detect(InputStream stream, String name) with an TikaInputStream. Sadly I still have problems with 3.2.0 since the test now fails since the "application/vnd.google-earth.kmz" file is detected as a plain "application/zip".
Reverting back to 3.1.0 makes the detection work with a plain InputStream /Pontus On 2025/05/28 16:33:23 Craig Muchinsky via user wrote: > I tested using the release, my dependency management tool made me aware > that 3.2.0 was available so I decided to kick the tires and ran into this > issue. I will have to spend some time on a reproduction scenario > > On Wed, May 28, 2025 at 1:33 AM Tilman Hausherr <[email protected]> > wrote: > > > Did you test with the release or with the candidate or with an earlier > > build? A bug like you mentioned was fixed just a few days ago. Please share > > the file and some minimal code. Tilman On 5/28/2025 2 > > *Caution*: External ([email protected]) > > First-Time Sender Details > > < https://protection.inkyphishfence.com/details?id=Y29sbGlicmEvY3JhaWcubXVjaGluc2t5QGNvbGxpYnJhLmNvbS8xMDA5MGNmMjBhODUwMjViNzQzYzVlM2VhYjk3MDI4MS8xNzQ4NDEwNDIyLjcwMTE2Nzg=#key=31268a81d07715bf5cf4cef79d6ad111 > > > Report This Email > > < https://protection.inkyphishfence.com/report?id=Y29sbGlicmEvY3JhaWcubXVjaGluc2t5QGNvbGxpYnJhLmNvbS8xMDA5MGNmMjBhODUwMjViNzQzYzVlM2VhYjk3MDI4MS8xNzQ4NDEwNDIyLjcwMTE2Nzg=#key=31268a81d07715bf5cf4cef79d6ad111 > > > > > Did you test with the release or with the candidate or with an earlier > > > > build? A bug like you mentioned was fixed just a few days ago. Please > > > > share the file and some minimal code. > > > > Tilman > > > > > > > > On 5/28/2025 2:08 AM, Craig Muchinsky via user wrote: > > > > > After upgrading to tika 3.2.0, I started seeing the following > > > > > exception when attempting to detect the mime type for a given file, > > > > > I'm wondering if something in the way input streams are handled has > > > > > changed, or if this might be a regression? > > > > > > > > > > Caused by: java.io.IOException: Resetting to invalid mark > > > > > at [email protected] > > /java.io.BufferedInputStream.implReset(BufferedInputStream.java:583) > > > > > at [email protected] > > /java.io.BufferedInputStream.reset(BufferedInputStream.java:569) > > > > > at > > app//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115) > > > > > at > > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279) > > > > > at > > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192) > > > > > at > > app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > > > > > at app//org.apache.tika.Tika.detect(Tika.java:160) > > > > > at app//org.apache.tika.Tika.detect(Tika.java:185) > > > > > > > > > > >
