Chris -

Thanks for offering to look into this.  

Recently I added unit tests to the AutoDetectParserTest class that generate
these failures.  I commented out the tests that didn't work, but left them
there so they could be used later, and hopefully reenabled when they succeed
(please see assertAutoDetect(String resource, String type, String content)). 
If you uncomment them, then the errors will cause the test to fail, and you
will see the behavior I'm describing.

The patch I provided in the previous message calls the MimeUtils and the
AutoDetectParser methods that determine MIME type to illustrate that it is
not just an AutoDetectParser problem.

Here is one way to approach this:

1) Apply the patch in my previous message to a fresh copy of Tika.

2) Remove the "//" from the println's in getMimeType2() if you'd like to get
debug output.

3) Run the unit test; in Intellij Idea, I just right click inside the test
method (AutoDetectParserTest.testWord()) and select "Run".  You can also
just run "mvn test" on the command line.

Feel free to get in touch anytime if there's anything else I can do to
clarify or help.

Regards,
Keith


Chris Mattmann wrote:
> 
> Keith:
> 
> Where exactly is it failing? In the unit tests? Or in your code? Could you
> be more specific so that I (or someone else) can track it down?
> 
> Thanks,
>  Chris
> 
> 
> 
> On 10/22/07 2:00 PM, "Keith R. Bennett" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> All -
>> 
>> We're still having the problem that MIME type detection from byte headers
>> is
>> failing.  I'll try to look into it, but if anyone else could also take a
>> look, that would be great.
>> 
>> I'm attaching a patch that:
>> 
>> * adds an alternate method for determining the MIME type that calls the
>> new
>> MimeUtils.getMimeType() method.
>> * calls the regular method and the alternate method and asserts that they
>> return the same result
>> * enables only one type of test so that the output is more manageable
>> * the alternate method turns the original one upside down; it's simpler
>> to
>> me because if a type is found via the byte header, the other methods are
>> not
>> attempted, etc.; let me know what you think.
>> 
>> This patch is not intended to ever be committed, but is just for review.
>> 
>> Thanks,
>> - Keith
>> 
>> 
>> 
>> http://www.nabble.com/file/p13352486/diag.patch diag.patch
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/MIME-Type-Detection-from-Byte-Header-Failing-tf4673629.html#a13352854
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to