Jukka & All -

It looks like the current getType() relies on the magic header matching only
when a type is returned based on the header.  Assuming it returns null and
not the DEFAULT type if it cannot recognize the header, I think this is how
it works:

If a type can be determined from the byte [] header, it is used.

Else, if a type can be determined from the type hint parameter, and that
type is consistent with the URL, it is used.

Else, if a type can be determined from the URL, it is used.

Is this the correct logic?

I've modified the documentation and some conditionals in the method so that
it is (IMHO) a little clearer.  I've attached a patch and a .txt file with
the method intact.  (Shall I commit this?)

http://www.nabble.com/file/p13278818/MimeUtils.patch MimeUtils.patch 
http://www.nabble.com/file/p13278818/MimeUtils.getMimeType.txt
MimeUtils.getMimeType.txt 

- Keith


Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/18/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote:
>> If I understand correctly, we already have what we need in MimeUtils:
>>     public String getType(String typeName, String url, byte[] data) { ...
>> }
> 
> The current MimeUtils.getType relies only on magic header matching,
> and should be fixed.
> 
> The main reason why I decided to implement my own version of the code
> based on MimeTypes in AutoDetectParser was that I was somewhat
> confused about the separation of concerns across MimeTypes and
> MimeUtils. The MimeTypes class already has a number of utility methods
> like getMimeType(String, byte[]) and getMimeType(URL), so I'm not sure
> why we need MimeUtils.
> 
>> Jukka, should I modify AutoDetectParser to call this method instead of
>> its
>> own?
> 
> OK once the method has been fixed.
> 
>> However, the bigger issue is, is the assessment that header based
>> detection
>> fails with certain file types correct?
> 
> Magic detection can never be 100% correct or complete, but there's a
> lot that we could still do to improve the current status in Tika.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Mime-type-detection-%28Was%3A--jira--Commented%3A-%28TIKA-79%29-Mime-type-detection-from-file-header-appears-to-be-failing.%29-tf4647810.html#a13278818
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to