Re: MediaTypeRegistry normalize query

2014-09-08 Thread Tom Barber
Hey Lewis Thanks for the reply. I think the charset stuff is related to https://issues.apache.org/jira/browse/TIKA-431 Regarding you single argument point, I'm certainly not a tika expert and was just updaing some of the OODT code from what was recommended on there, I shall investigate an a

Re: MediaTypeRegistry normalize query

2014-09-08 Thread Lewis John Mcgibbney
Hi Tom, On Sun, Sep 7, 2014 at 1:28 PM, wrote: > > now when parsing HTML files these days Tika adds the charset attribute to > the string. > Is this behavhiour consistent with other MimeTypes? > > I would have thought the normalize call was designed to remove this > because tika-mimetypes.xml