[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405785#comment-15405785
]
Tim Allison commented on TIKA-721:
--
While working on TIKA-2038, I found that ICU4J is now correctly
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119018#comment-13119018
]
Nick Burch commented on TIKA-721:
-
I'd suggest we check for invalid UTF-16 sequences (see
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119035#comment-13119035
]
Michael McCandless commented on TIKA-721:
-
bq. I'd suggest we check for invalid
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119038#comment-13119038
]
Robert Muir commented on TIKA-721:
--
{quote}
Finally, for the valid code points, I count how
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13119044#comment-13119044
]
Michael McCandless commented on TIKA-721:
-
{quote}
bq. Finally, for the valid code
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107969#comment-13107969
]
Nick Burch commented on TIKA-721:
-
In CharsetRecog_Unicode on line 69 (inside