Improve encoding detection speed and accuracy
---------------------------------------------
Key: TIKA-322
URL: https://issues.apache.org/jira/browse/TIKA-322
Project: Tika
Issue Type: Improvement
Components: mime
Reporter: Jukka Zitting
Priority: Minor
The encoding detection code we took from ICU4J is not very efficient and
sometimes produces odd results when more than one encoding matches the given
input data. It would be good to refactor the code to be faster for
easy-to-detect encodings and to have better heuristics in case multiple matches
are found.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.