[ 
https://issues.apache.org/jira/browse/TIKA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-285.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.5
         Assignee: Jukka Zitting

Yes, the file(1) command comes with a pretty impressive set of magic byte 
patterns. I'll file a separate issue for getting those included in Tika.

Meanwhile I've now updated the Tika type registry to contain everything 
included in the mime.types and magic files in the latest Apache HTTP Server 
trunk. The summary is pretty impressive:

 * The media type registry in Tika was synchronized with the MIME type
   configuration in the Apache HTTP Server. Tika now knows about 1274
   different media types and can detect 672 of those using 927 file
   extension and 280 magic byte patterns. (TIKA-285)


> Update media type registry to the latest httpd mime type database
> -----------------------------------------------------------------
>
>                 Key: TIKA-285
>                 URL: https://issues.apache.org/jira/browse/TIKA-285
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.5
>
>
> The MIME type database included in the Apache HTTP Server is one of the more 
> complete and accurate media type and file extension resources out there.
> Their magic byte settings don't seem to be as complete as the ones in Tika, 
> but it would be good to check also those settings for extra information.
> ... and we should contribute any of the recent Tika settings back to httpd 
> where they don't already know of those details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to