Hi,
Please test with the unreleased 3.2.1:
https://dist.apache.org/repos/dist/dev/tika/3.2.1/
https://repository.apache.org/content/repositories/orgapachetika-1115/org/apache/tika
Tilman
On 6/23/2025 11:01 AM, Alvaro Nogueira via user wrote:
---------- Forwarded message ---------
From: *Alvaro Nogueira* <[email protected]>
Date: Mon, Jun 23, 2025 at 10:54 AM
Subject: InputStream consumed by Tika.detect
To: <[email protected]>
Hello,
We've been using Tika version 3.1.0 to successfully detect MimeTypes
from files before uploading them to our S3.
However, after v3.2.0 upgrade, we've noticed that the original
inputStream is being consumed entirely for certain file extensions.
The affected extensions seem to be all for Microsoft files, pointing
us to the POIFSContainerDetector, which was actually changed for this
release.
This is the list of extensions we've tested with errors: doc, docx,
odt, ppt, pptx, xls, xlsx
And these ones work as before: bmp, csv, gif, jpeg, jpg, pdf, png,
rtf, svg, txt
Here's some code to reproduce the issue:
class TikaBugReport {
// affected extensions: doc, docx, odt, ppt, pptx, xls, xlsx public
static void main(String[] args)throws IOException {
String fileName ="Test.docx";
InputStream inputStream =new
ClassPathResource(fileName).getInputStream();
checkFileMime(inputStream, fileName);
}
public static void checkFileMime(InputStream inputStream, String fileName)
{
try {
Tika tika =new Tika();
System.out.println("InputStream available bytes before processing:
" + inputStream.available());
System.out.println("InputStream supports mark: " +
inputStream.markSupported());
Metadata metadata =new Metadata();
TikaInputStream tikaInputStream = TikaInputStream.get(inputStream);
System.out.println("Original InputStream available bytes after
TikaInputStream.get(): " + inputStream.available());
String mimeType = tika.detect(tikaInputStream, metadata);
// Debug: Check state after detection System.out.println("Original
InputStream available bytes after tika.detect(): " + inputStream.available());
System.out.println("TikaInputStream available bytes after
tika.detect(): " + tikaInputStream.available());
if (inputStream.available() ==0) {
throw new IllegalStateException("InputStream is empty after
TikaInputStream creation");
}
}catch (Exception e) {
System.out.printf("Mime check exception for file '%s': [%s]%n",
fileName, e.getMessage());
}
}
}
--
Thank you and regards,
Álvaro Nogueira
Senior Software Engineer
Logo <https://www.flywire.com/> LinkedIn icon
<https://www.linkedin.com/company/flywire> Twitter icon
<https://twitter.com/Flywire> Facebook icon
<https://www.facebook.com/Flywire> Instagram icon
<https://www.instagram.com/insideflywire/>
Disclaimer for electronic communications
<https://www.flywire.com/legal/disclaimer-for-electronic-communications>