[
https://issues.apache.org/jira/browse/TIKA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-285.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.5
Assignee: Jukka Zitting
Yes, the file(1) command comes with a pretty impressive set of magic byte
patterns. I'll file a separate issue for getting those included in Tika.
Meanwhile I've now updated the Tika type registry to contain everything
included in the mime.types and magic files in the latest Apache HTTP Server
trunk. The summary is pretty impressive:
* The media type registry in Tika was synchronized with the MIME type
configuration in the Apache HTTP Server. Tika now knows about 1274
different media types and can detect 672 of those using 927 file
extension and 280 magic byte patterns. (TIKA-285)
> Update media type registry to the latest httpd mime type database
> -----------------------------------------------------------------
>
> Key: TIKA-285
> URL: https://issues.apache.org/jira/browse/TIKA-285
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Reporter: Jukka Zitting
> Assignee: Jukka Zitting
> Fix For: 0.5
>
>
> The MIME type database included in the Apache HTTP Server is one of the more
> complete and accurate media type and file extension resources out there.
> Their magic byte settings don't seem to be as complete as the ones in Tika,
> but it would be good to check also those settings for extra information.
> ... and we should contribute any of the recent Tika settings back to httpd
> where they don't already know of those details.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.