Hi Tom,
On Sun, Sep 7, 2014 at 1:28 PM, dev-digest-h...@tika.apache.org wrote:
now when parsing HTML files these days Tika adds the charset attribute to
the string.
Is this behavhiour consistent with other MimeTypes?
I would have thought the normalize call was designed to remove this
Hey Lewis
Thanks for the reply.
I think the charset stuff is related to
https://issues.apache.org/jira/browse/TIKA-431
Regarding you single argument point, I'm certainly not a tika expert and
was just updaing some of the OODT code from what was recommended on
there, I shall investigate an
Hey guys
I was doing some stuff related to MimeTypes.getRegisteredMimeType and
within that method it calls
registry.normalize(type)
now when parsing HTML files these days Tika adds the charset attribute
to the string.
I would have thought the normalize call was designed to remove this