[
https://issues.apache.org/jira/browse/TIKA-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-719:
Assignee: Ken Krugler
Concurrent usage of HtmlParser causes infinite loop in HashMap
EBCDIC encoding not detected
Key: TIKA-720
URL: https://issues.apache.org/jira/browse/TIKA-720
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Michael McCandless
[
https://issues.apache.org/jira/browse/TIKA-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-720:
Attachment: English_EBCDIC.txt
EBCDIC encoding not detected
UTF16-LE not detected
-
Key: TIKA-721
URL: https://issues.apache.org/jira/browse/TIKA-721
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Michael McCandless
Priority: Minor
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-721:
Attachment: Chinese_Simplified_utf16.txt
UTF16-LE not detected
-
[
https://issues.apache.org/jira/browse/TIKA-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107959#comment-13107959
]
Nick Burch commented on TIKA-705:
-
Initial workaround committed in r1172690.
The proper fix
[
https://issues.apache.org/jira/browse/TIKA-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107969#comment-13107969
]
Nick Burch commented on TIKA-721:
-
In CharsetRecog_Unicode on line 69 (inside
[
https://issues.apache.org/jira/browse/TIKA-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107977#comment-13107977
]
Nick Burch commented on TIKA-720:
-
A few IBM specific encodings are supported already in
[
https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated TIKA-722:
---
Attachment: metadata.png
I checked this file: Thats exactly this type of file I am talking about, here
[
https://issues.apache.org/jira/browse/TIKA-722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated TIKA-722:
---
Attachment: JUFO96.PDF
Here is a non-persian example (which is actually a very-very early writeup from
[
https://issues.apache.org/jira/browse/TIKA-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-724:
Attachment: extraSpaces.pdf
PDF text sometimes has extra space between letters
[
https://issues.apache.org/jira/browse/TIKA-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108034#comment-13108034
]
Michael McCandless commented on TIKA-720:
-
Thanks Nick! That actually sounds
12 matches
Mail list logo