Uncorrect mime-type detection for ooxml
---------------------------------------
Key: TIKA-257
URL: https://issues.apache.org/jira/browse/TIKA-257
Project: Tika
Issue Type: Bug
Components: general
Affects Versions: 0.4
Reporter: Maxim Valyanskiy
MimeTypes detects docx (and other office XML documents) as 'application/zip'
when file does not have proper extension:
$ java -jar tika-app/target/tika-app-0.4-SNAPSHOT.jar -m
/home/maxcom/download-tmp/proto.docx
Content-Type:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
resourceName: proto.docx
$ cat /home/maxcom/download-tmp/proto.docx | java -jar
tika-app/target/tika-app-0.4-SNAPSHOT.jar -m
Content-Type: application/zip
This breaks text extraction when filename is not known
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.