Will take a look when back to a keyboard. This is likely a bug. Separately, I’d recommend TikaInputStream.get(Path) so that Tika can use the underlying file directly.
On Fri, Dec 27, 2024 at 6:17 AM Patrick Langer < [email protected]> wrote: > Hi all, > We are using Tika 3.0.0 with IBM Semeru Runtime Open Edition 21.0.5.11 > (build 21.0.5+11-LTS) and doing the following: > > try (final InputStream is = Files.newInputStream(file.toPath())) { > final DefaultDetector detector = new DefaultDetector(); > > final Metadata metadata = new Metadata(); > metadata.add(RESOURCE_NAME_KEY, filename); > try (TikaInputStream doc = TikaInputStream.get(inputStream)) { > return detector.detect(doc, metadata); > } > } > > > We had an issue with a .ppt file which resulted in the following stack trace: > > IOExceptionResetting to invalid mark > • java.io.BufferedInputStream in implReset > • java.io.BufferedInputStream in reset > • org.apache.commons.io.input.ProxyInputStream in reset at line 293 > • org.apache.tika.io.TikaInputStream in reset at line 822 > • org.apache.tika.io.TikaInputStream in getPath at line 710 > • org.apache.tika.detect.microsoft.POIFSContainerDetector in > getTopLevelNames at line 566 > • org.apache.tika.detect.microsoft.POIFSContainerDetector in detect at > line 629 > • org.apache.tika.detect.CompositeDetector in detect at line 84 > > > Unfortunately I cannot share the file nor do I have access to it. > > Could you help me figure out whether this is a bug in Tika or a user error on > my side? > > > Kind Regards, > > Patrick Langer > >
