Chris -

If I understand correctly, we already have what we need in MimeUtils:
    public String getType(String typeName, String url, byte[] data) { ... }

Jukka, should I modify AutoDetectParser to call this method instead of its
own?

However, the bigger issue is, is the assessment that header based detection
fails with certain file types correct?  For example, it fails to identify
the type of the Powerpoint test document we provide.  Do we know which types
can and can't be detected?  If so, it would be helpful to our users and
ourselves to document that information.  I could put something together
based on my observations, but that would risk being incomplete or incorrect
due to different document software versions (e.g. Word).

- Keith


JIRA [EMAIL PROTECTED] wrote:
> 
> 
>     [
> https://issues.apache.org/jira/browse/TIKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535917
> ] 
> 
> Chris A. Mattmann commented on TIKA-79:
> ---------------------------------------
> 
> Guys:
> 
> Why don't we put a utility method in MimeUtils to handle this
> functionality. The purpose of the utility method is to try and sense a
> mime type using all available options (URL resolution, extension ID, mime
> magic, etc.)
> 
> There is currently code in Nutch at:
> 
> http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Content.java?view=markup
> 
> See the private String getContentType(String typeName, String url, byte[]
> data) method at the bottom of the class to see how Nutch does this sort of
> failsafe mime resolution. Perhaps we should follow similar suit in Tika?
> 
> Cheers,
>  Chris
> 

-- 
View this message in context: 
http://www.nabble.com/-jira--Created%3A-%28TIKA-79%29-Mime-type-detection-from-file-header-appears-to-be-failing.-tf4644634.html#a13276570
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to