[
https://issues.apache.org/jira/browse/TIKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535919
]
Bertrand Delacretaz commented on TIKA-79:
-----------------------------------------
+1 for a utility method as proposed by Chris, that tries several detection
methods.
> Mime type detection from file header appears to be failing.
> -----------------------------------------------------------
>
> Key: TIKA-79
> URL: https://issues.apache.org/jira/browse/TIKA-79
> Project: Tika
> Issue Type: Bug
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Fix For: 0.1-incubator
>
> Attachments: AutoDetectParser.patch
>
>
> Unit tests to test the behavior of AutoDetectParser fail when byte header
> detection is needed. When correct names of resources and MIME types are
> passed into the Metadata object, the values below show what was found. Note
> that some of the document types have null for typeFromHeader:
> typeFromContentTypeHint = application/vnd.ms-excel
> typeFromResourceName = application/vnd.ms-excel
> typeFromHeader = null
> type = application/vnd.ms-excel
> typeFromContentTypeHint = text/html
> typeFromResourceName = text/html
> typeFromHeader = text/html
> type = text/html
> typeFromContentTypeHint = application/vnd.oasis.opendocument.text
> typeFromResourceName = application/vnd.oasis.opendocument.text
> typeFromHeader = application/vnd.oasis.opendocument.text
> type = application/vnd.oasis.opendocument.text
> typeFromContentTypeHint = application/pdf
> typeFromResourceName = application/pdf
> typeFromHeader = application/pdf
> type = application/pdf
> typeFromContentTypeHint = application/vnd.ms-powerpoint
> typeFromResourceName = application/vnd.ms-powerpoint
> typeFromHeader = null
> type = application/vnd.ms-powerpoint
> log4j:WARN No appenders could be found for logger (root).
> log4j:WARN Please initialize the log4j system properly.
> typeFromContentTypeHint = application/rtf
> typeFromResourceName = application/rtf
> typeFromHeader = null
> type = application/rtf
> typeFromContentTypeHint = text/plain
> typeFromResourceName = text/plain
> typeFromHeader = null
> type = text/plain
> typeFromContentTypeHint = application/msword
> typeFromResourceName = application/msword
> typeFromHeader = null
> type = application/msword
> typeFromContentTypeHint = application/xml
> typeFromResourceName = application/xml
> typeFromHeader = null
> type = application/xml
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.