Will take a look when back to a keyboard. This is likely a bug.

Separately, I’d recommend TikaInputStream.get(Path) so that Tika can use
the underlying file directly.

On Fri, Dec 27, 2024 at 6:17 AM Patrick Langer <
[email protected]> wrote:

> Hi all,
> We are using Tika 3.0.0 with IBM Semeru Runtime Open Edition 21.0.5.11
> (build 21.0.5+11-LTS) and doing the following:
>
> try (final InputStream is = Files.newInputStream(file.toPath())) {
>   final DefaultDetector detector = new DefaultDetector();
>
>   final Metadata metadata = new Metadata();
>   metadata.add(RESOURCE_NAME_KEY, filename);
>   try (TikaInputStream doc = TikaInputStream.get(inputStream)) {
>     return detector.detect(doc, metadata);
>   }
> }
>
>
> We had an issue with a .ppt file which resulted in the following stack trace:
>
> IOExceptionResetting to invalid mark
>     • java.io.BufferedInputStream in implReset
>     • java.io.BufferedInputStream in reset
>     • org.apache.commons.io.input.ProxyInputStream in reset at line 293
>     • org.apache.tika.io.TikaInputStream in reset at line 822
>     • org.apache.tika.io.TikaInputStream in getPath at line 710
>     • org.apache.tika.detect.microsoft.POIFSContainerDetector in 
> getTopLevelNames at line 566
>     • org.apache.tika.detect.microsoft.POIFSContainerDetector in detect at 
> line 629
>     • org.apache.tika.detect.CompositeDetector in detect at line 84
>
>
> Unfortunately I cannot share the file nor do I have access to it.
>
> Could you help me figure out whether this is a bug in Tika or a user error on 
> my side?
>
>
> Kind Regards,
>
> Patrick Langer
>
>

Reply via email to