Jukka & All - It looks like the current getType() relies on the magic header matching only when a type is returned based on the header. Assuming it returns null and not the DEFAULT type if it cannot recognize the header, I think this is how it works:
If a type can be determined from the byte [] header, it is used. Else, if a type can be determined from the type hint parameter, and that type is consistent with the URL, it is used. Else, if a type can be determined from the URL, it is used. Is this the correct logic? I've modified the documentation and some conditionals in the method so that it is (IMHO) a little clearer. I've attached a patch and a .txt file with the method intact. (Shall I commit this?) http://www.nabble.com/file/p13278818/MimeUtils.patch MimeUtils.patch http://www.nabble.com/file/p13278818/MimeUtils.getMimeType.txt MimeUtils.getMimeType.txt - Keith Jukka Zitting wrote: > > Hi, > > On 10/18/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote: >> If I understand correctly, we already have what we need in MimeUtils: >> public String getType(String typeName, String url, byte[] data) { ... >> } > > The current MimeUtils.getType relies only on magic header matching, > and should be fixed. > > The main reason why I decided to implement my own version of the code > based on MimeTypes in AutoDetectParser was that I was somewhat > confused about the separation of concerns across MimeTypes and > MimeUtils. The MimeTypes class already has a number of utility methods > like getMimeType(String, byte[]) and getMimeType(URL), so I'm not sure > why we need MimeUtils. > >> Jukka, should I modify AutoDetectParser to call this method instead of >> its >> own? > > OK once the method has been fixed. > >> However, the bigger issue is, is the assessment that header based >> detection >> fails with certain file types correct? > > Magic detection can never be 100% correct or complete, but there's a > lot that we could still do to improve the current status in Tika. > > BR, > > Jukka Zitting > > -- View this message in context: http://www.nabble.com/Mime-type-detection-%28Was%3A--jira--Commented%3A-%28TIKA-79%29-Mime-type-detection-from-file-header-appears-to-be-failing.%29-tf4647810.html#a13278818 Sent from the Apache Tika - Development mailing list archive at Nabble.com.
