[
https://issues.apache.org/jira/browse/TIKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781311#action_12781311
]
Yuan-Fang Li commented on TIKA-309:
-----------------------------------
Hi Chris, Jukka,
Yes, the Tika tests are passing for me. However, my test for one of the
ontologies ("http://www.w3.org/2002/07/owl#") is still failing, and here is
why.
In test tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java,
the method testUrl(String expected, String url, String file) is actually
testing the content in the file named "file" with the url being a clue for the
detection. My test, however, opens an input stream on the actual url and use
that to detect the mime type. For the above URL, tika is testing against the
file named "test-difficult-rdf2.xml". The only difference I can see between
this file and the actual content of the URl is the one line at the top: "<?xml
version='1.0' encoding='ISO-8859-1'?>". This line is present in the tika test
file but not in the URL.
So. if you remove/comment out that line from "test-difficult-rdf2.xml" and run
the following maven command to run the test: mvn -Dtest=MimeDetectionTest test,
it will fail. Or, you could use the following test case to test against the
real URL.
@Test
public void testRDFStreamMimeType() throws IOException {
URL url = new URL("http://www.w3.org/2002/07/owl#");
final InputStream stream = new BufferedInputStream(url.openStream());
try {
MimeTypes mimeTypes =
TikaConfig.getDefaultConfig().getMimeRepository();
Metadata metadata = new Metadata();
String mime = mimeTypes.detect(stream, metadata).toString();
assertEquals("application/rdf+xml", mime);
} finally {
stream.close();
}
}
Cheers
Yuan-Fang
> Mime type application/rdf+xml not correctly detected
> ----------------------------------------------------
>
> Key: TIKA-309
> URL: https://issues.apache.org/jira/browse/TIKA-309
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 0.5
> Reporter: Yuan-Fang Li
> Assignee: Chris A. Mattmann
> Priority: Minor
> Fix For: 0.5
>
>
> Mime type detector using AutoDetectParser and Metadata returns
> "application/xml" for the URL http://www.w3.org/2002/07/owl#, where it should
> be "application/rdf+xml". The correct mime type is also suggested here:
> http://www.w3.org/TR/owl-ref/#MIMEType.
> P.S., Tika was downloaded from svn and built with Maven last week.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.