[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: fix-for-TIKA2592-contributed-by-Andreas-Meier.patch > HTML with charset unicode handled a

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: TestHTMLCharsetCP1256.html TestHTMLCharsetArabicCP1256.html > HTML with c

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-2592: -- Issue Type: Improvement (was: Bug) > HTML with charset unicode handled as utf-16 instead utf-8 > ---

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-2592: -- Priority: Minor (was: Major) > HTML with charset unicode handled as utf-16 instead utf-8 > -

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-02 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-2592: -- Attachment: IANA Charset names.txt > HTML with charset unicode handled as utf-16 instead utf-8 >

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-03-05 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: StandardCharsets_unsupported_by_IANA.txt > HTML with charset unicode handled as utf-16 in