incorrect mime type detection when Metadata.RESOURCE_NAME_KEY set
-----------------------------------------------------------------
Key: TIKA-384
URL: https://issues.apache.org/jira/browse/TIKA-384
Project: Tika
Issue Type: Bug
Components: mime
Affects Versions: 0.6
Environment: Java: 1.6.0_17; Java HotSpot(TM) Client VM 14.3-b01
System: Windows XP version 5.1 running on x86; Cp1252; en_GB (nb)
Reporter: Jim Kay
When Metadata.RESOURCE_NAME_KEY set is set as in:
metadata.set(Metadata.RESOURCE_NAME_KEY, f.getCanonicalPath())
the incorrect mime type is set
I was trying to add .csv files as a type by editing the xml mime types. When I
ran a .csv file (and for comparison a .css file) through TikaGUI they were both
passed successfully as text.
In my AutoDetectParser example I had set the RESOURCE_NAME_KEY to
f.getCanonicalPath() (this code was copied - I don't know what it does). In
this example .css and .csv were NOT identified as text/plain.
The issue is in MimeTypes with the following code:
String resourceName = metadata.get(Metadata.RESOURCE_NAME_KEY);
if (resourceName != null) {
String name = null;
...
...
if (name != null) {
MimeType hint = getMimeType(name);
if (hint.isDescendantOf(type)) {
type = hint;
}
}
If the RESOURCE_NAME_KEY is not null then the code ultimately resets type to
hint, however hint is text/css. So the correct identification of type as
text/plain is overwritten.
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.