[jira] [Created] (TIKA-2594) Mail detected as application/xhtml+xml

2018-02-28 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2594: --- Summary: Mail detected as application/xhtml+xml Key: TIKA-2594 URL: https://issues.apache.org/jira/browse/TIKA-2594 Project: Tika Issue Type: Bug Affects

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380874#comment-16380874 ] Ken Krugler commented on TIKA-2592: --- Hi [~AndreasMeier] - actually "unicode" is a supported charset name

[jira] [Created] (TIKA-2593) docx with track change producing incorrect output

2018-02-28 Thread Md (JIRA)
Md created TIKA-2593: Summary: docx with track change producing incorrect output Key: TIKA-2593 URL: https://issues.apache.org/jira/browse/TIKA-2593 Project: Tika Issue Type: Bug Components:

[jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2018-02-28 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380773#comment-16380773 ] Md commented on TIKA-207: - By the way I am using AutoDetectParser() > MS word doc containing tracked changes

[jira] [Commented] (TIKA-2585) TikaInputStream support for resetting via a factory of InputStreams

2018-02-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380765#comment-16380765 ] Luis Filipe Nassif commented on TIKA-2585: -- Hi [~gagravarr], I don't know. I think we can create

[jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2018-02-28 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380752#comment-16380752 ] Md commented on TIKA-207: - I am using tika 1.17 but still it's getting deleted text from track revised files. Is

[jira] [Commented] (TIKA-2576) Add application/zstd detection and parser

2018-02-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380705#comment-16380705 ] Markus Jelsma commented on TIKA-2576: - I don't know if it is documented but that config file will fix

[jira] [Commented] (TIKA-2591) Some tiffs (Big Endian with fax compression) are showing up as x-tarr

2018-02-28 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380159#comment-16380159 ] Luis Filipe Nassif commented on TIKA-2591: -- Hum sorry. The higher the number, higher the priority.

[jira] [Commented] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380106#comment-16380106 ] Andreas Meier commented on TIKA-2592: - Attached a sample patch to set UTF-8 as default for "unicode"

[jira] [Updated] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Meier updated TIKA-2592: Attachment: fix-for-TIKA2592-contributed-by-Andreas-Meier.patch > HTML with charset unicode handled

[jira] [Created] (TIKA-2592) HTML with charset unicode handled as utf-16 instead utf-8

2018-02-28 Thread Andreas Meier (JIRA)
Andreas Meier created TIKA-2592: --- Summary: HTML with charset unicode handled as utf-16 instead utf-8 Key: TIKA-2592 URL: https://issues.apache.org/jira/browse/TIKA-2592 Project: Tika Issue

Re: Unnecessary WARNING Logging?

2018-02-28 Thread Nick Burch
On Tue, 27 Feb 2018, lewis john mcgibbney wrote: I don't know when it was introduced, by I see the following, rather annoying WARNING messages in many logs now. IIRC we're changing those to ignore in Tika 2.x, but as we always warned for missing parsers / missing parser classes in 1.x we