incorrect mime type detection when Metadata.RESOURCE_NAME_KEY set
-----------------------------------------------------------------

                 Key: TIKA-384
                 URL: https://issues.apache.org/jira/browse/TIKA-384
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 0.6
         Environment: Java: 1.6.0_17; Java HotSpot(TM) Client VM 14.3-b01
System: Windows XP version 5.1 running on x86; Cp1252; en_GB (nb)
            Reporter: Jim Kay


When Metadata.RESOURCE_NAME_KEY set is set as in:
metadata.set(Metadata.RESOURCE_NAME_KEY, f.getCanonicalPath())
the incorrect mime type is set

I was trying to add .csv files as a type by editing the xml mime types.  When I 
ran a .csv file (and for comparison a .css file) through TikaGUI they were both 
passed successfully as text.
In my AutoDetectParser example I had set the RESOURCE_NAME_KEY to  
f.getCanonicalPath() (this code was copied - I don't know what it does). In 
this example .css and .csv were NOT identified as text/plain.

The issue is in MimeTypes with the following code:
        String resourceName = metadata.get(Metadata.RESOURCE_NAME_KEY);
        if (resourceName != null) {
            String name = null;
...
...
            if (name != null) {
                MimeType hint = getMimeType(name);
                if (hint.isDescendantOf(type)) {
                    type = hint;
                }
            }
If the RESOURCE_NAME_KEY is not null then the code ultimately resets type to 
hint, however hint is text/css. So the correct identification of type as 
text/plain is overwritten.

        }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to