Hi all, I just created an issue https://issues.apache.org/jira/browse/TIKA-1292
In short: it's about Tika Detector detecting a JAR file (correct ZIP file, with proper magic bytes, etc) as "text/html" instead of expected "application/java-archive". The reason is clear to me (we already created a PR in Nexus project for that), but the interesting thing what bothers me is _why_ Detector behaves correctly with tika-parsers on classpath? How is the presence of tika-parsers affecting the MIME magic detection and most interestingly, why does it affects? (am aware of added org.apache.tika.parser.pkg.ZipContainerDetector). Isn't MIME magic detection based on bundled tika-mimetypes.xml, where even the globs defined for text/html (*.htm and *.html) does not match for the JAR file above (*.jar), still, Tika selects the HTML mime type.... Thanks, ~t~