Re: MediaTypeRegistry normalize query

2014-09-08 Thread Lewis John Mcgibbney
Hi Tom, On Sun, Sep 7, 2014 at 1:28 PM, dev-digest-h...@tika.apache.org wrote: now when parsing HTML files these days Tika adds the charset attribute to the string. Is this behavhiour consistent with other MimeTypes? I would have thought the normalize call was designed to remove this

Re: MediaTypeRegistry normalize query

2014-09-08 Thread Tom Barber
Hey Lewis Thanks for the reply. I think the charset stuff is related to https://issues.apache.org/jira/browse/TIKA-431 Regarding you single argument point, I'm certainly not a tika expert and was just updaing some of the OODT code from what was recommended on there, I shall investigate an

MediaTypeRegistry normalize query

2014-09-07 Thread Tom Barber
Hey guys I was doing some stuff related to MimeTypes.getRegisteredMimeType and within that method it calls registry.normalize(type) now when parsing HTML files these days Tika adds the charset attribute to the string. I would have thought the normalize call was designed to remove this