Increase buffer size for meta tag sniffing
------------------------------------------
Key: TIKA-357
URL: https://issues.apache.org/jira/browse/TIKA-357
Project: Tika
Issue Type: Improvement
Affects Versions: 0.6
Reporter: Ken Krugler
Assignee: Ken Krugler
Priority: Minor
Fix For: 0.6
Attachments: makler.html
Some web pages (such as makler.su, see attached) have lots of script data
before the body of the HTML.
When this happens, the sniffing code fails to find the charset info in the meta
tag, because it currently only sniffs the first 4K.
Bumping it to 8K would cover all of the cases that I (Ken) have seen during a
test crawl.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.