Hey Lewis
Thanks for the reply.
I think the charset stuff is related to
https://issues.apache.org/jira/browse/TIKA-431
Regarding you single argument point, I'm certainly not a tika expert and
was just updaing some of the OODT code from what was recommended on
there, I shall investigate an a
Hi Tom,
On Sun, Sep 7, 2014 at 1:28 PM, wrote:
>
> now when parsing HTML files these days Tika adds the charset attribute to
> the string.
>
Is this behavhiour consistent with other MimeTypes?
>
> I would have thought the normalize call was designed to remove this
> because tika-mimetypes.xml