[ 
https://issues.apache.org/jira/browse/TIKA-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534160
 ] 

Keith R. Bennett commented on TIKA-56:
--------------------------------------

Chris -

I don't know of any such cases, but then we've reached the limits of my 
knowledge of MIME types. ;)

However, if we have a utility that determines the MIME type from an extension, 
my sense is that is reasonable to make the extension comparisons case 
insensitive.  Especially in the Windows world, there are huge numbers of files 
out there with upper case extensions.  To me, it makes sense for the default to 
be to consider "PDF" equal to "pdf"; otherwise, we will get lots of "bugs" 
reported. ;)

If there are any obscure cases where case matters, I think it may be reasonable 
to require the user to use other means of determining the MIME type (have the 
user determine it himself, or use "magic"?).  

- Keith



> Mime type detection fails with upper case file extensions such as "PDF".
> ------------------------------------------------------------------------
>
>                 Key: TIKA-56
>                 URL: https://issues.apache.org/jira/browse/TIKA-56
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Critical
>             Fix For: 0.1-incubator
>
>
> Mime type detection only seems to work when the file extension is lower case. 
>  Both PDF and DOC extensions failed.
> To test this, add the following method to TestParsers:
>     public void testGetParsers() throws TikaException, MalformedURLException {
>         assertNotNull(ParseUtils.getParser(new URL("file:x.pdf"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.PDF"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.doc"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.DOC"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.txt"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.TXT"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.html"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.HTML"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.HtMl"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.htm"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.HTM"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.ppt"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.PPT"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.xls"), tc));
>         assertNotNull(ParseUtils.getParser(new URL("file:x.XLS"), tc));
>         // more?
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to