HtmlParser's http-equiv code needs to be more flexible
------------------------------------------------------
Key: TIKA-349
URL: https://issues.apache.org/jira/browse/TIKA-349
Project: Tika
Issue Type: Improvement
Affects Versions: 0.6
Reporter: Ken Krugler
Priority: Minor
Some http-equiv meta tags in HTML documents have charset attributes that
currently aren't handled properly.
For example, <meta http-equiv="content-type" content="text/html; charset=utf-8;
charset=UTF-8">
Or where content="text/html;; charset="utf-8" (note double semi-colons)
The parsing code needs to be more flexible to handle these edge cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.