[ 
https://issues.apache.org/jira/browse/TIKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith R. Bennett updated TIKA-79:
---------------------------------

    Attachment: AutoDetectParser.patch

The attached patch file reorganizes the MIME type determination in 
AutoDetectParser so that it is easier to print out the types found by the 
various methods, and the logic for choosing the predominant result is confined 
to a smaller area (assuming I understood the intent correctly, that is).  In 
other words, I found it easier to debug.  If you like, I can commit it, minus 
the print statements.

I also found it helpful to comment out the LOG.info() call in MimeTypes.load(). 
 (Is there a better way to disable it, by setting that logger to some kind of 
null appender or someting like that?)


> Mime type detection from file header appears to be failing.
> -----------------------------------------------------------
>
>                 Key: TIKA-79
>                 URL: https://issues.apache.org/jira/browse/TIKA-79
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: AutoDetectParser.patch
>
>
> Unit tests to test the behavior of AutoDetectParser fail when byte header 
> detection is needed.  When correct names of resources and MIME types are 
> passed into the Metadata object, the values below show what was found.  Note 
> that some of the document types have null for typeFromHeader:
> typeFromContentTypeHint = application/vnd.ms-excel
> typeFromResourceName = application/vnd.ms-excel
> typeFromHeader = null
> type = application/vnd.ms-excel
> typeFromContentTypeHint = text/html
> typeFromResourceName = text/html
> typeFromHeader = text/html
> type = text/html
> typeFromContentTypeHint = application/vnd.oasis.opendocument.text
> typeFromResourceName = application/vnd.oasis.opendocument.text
> typeFromHeader = application/vnd.oasis.opendocument.text
> type = application/vnd.oasis.opendocument.text
> typeFromContentTypeHint = application/pdf
> typeFromResourceName = application/pdf
> typeFromHeader = application/pdf
> type = application/pdf
> typeFromContentTypeHint = application/vnd.ms-powerpoint
> typeFromResourceName = application/vnd.ms-powerpoint
> typeFromHeader = null
> type = application/vnd.ms-powerpoint
> log4j:WARN No appenders could be found for logger (root).
> log4j:WARN Please initialize the log4j system properly.
> typeFromContentTypeHint = application/rtf
> typeFromResourceName = application/rtf
> typeFromHeader = null
> type = application/rtf
> typeFromContentTypeHint = text/plain
> typeFromResourceName = text/plain
> typeFromHeader = null
> type = text/plain
> typeFromContentTypeHint = application/msword
> typeFromResourceName = application/msword
> typeFromHeader = null
> type = application/msword
> typeFromContentTypeHint = application/xml
> typeFromResourceName = application/xml
> typeFromHeader = null
> type = application/xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to