[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2038:
--
Attachment: comparisons_20160803b.xlsx
Full results; fixed spurious extra rows in output
> A more
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2038:
--
Attachment: (was: comparisons_20160803.xlsx)
> A more accurate facility for detecting Charset
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406408#comment-15406408
]
Tim Allison edited comment on TIKA-2038 at 8/3/16 6:51 PM:
---
I wrote a markup
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2038:
--
Attachment: comparisons_20160803.xlsx
I wrote a markup stripper that ignores content in tags, comments,
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405785#comment-15405785
]
Tim Allison edited comment on TIKA-721 at 8/3/16 12:03 PM:
---
While working on
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405785#comment-15405785
]
Tim Allison commented on TIKA-721:
--
While working on TIKA-2038, I found that ICU4J is now correctly