[
https://issues.apache.org/jira/browse/TIKA-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-336.
------------------------------------
Resolution: Fixed
- fixed in r884340
Yuan-Fang, please test out the latest Tika trunk. I've:
* updated the test-difficult-rdf2.xml file to remove the <?xml header
* updated the tika-mimetypes.xml to detect files that start with <!-- as xml
files (as a default magic first check). Then, this forces xmlRoot detection to
occur where the specific XML subclass is detected (which is what we want).
There, application/rdf+xml is properly detected. Before, since there was no
magic header for <!--, the initial magic result check was null and then the
mimeTypes detector ended up returning text/plain.
In the future we may want to make:
* xmlRoot extraction occur on text/plain documents
* move the text/plain check to the beginning of the
o.a.tika.mime.MimeTypes#getMimeType(byte[] data) function
> More issues with RDF mime detection
> -----------------------------------
>
> Key: TIKA-336
> URL: https://issues.apache.org/jira/browse/TIKA-336
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 0.5
> Environment: several user environments as well as validated in
> Mattmann's environment.
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 0.6
>
>
> See TIKA-309 for related discussion, but there seems to be further errors in
> RDF mime detection, on the OWL file located here:
> http://www.w3.org/2002/07/owl#
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.