Reinhard Schwab created TIKA-1500:
-
Summary: FeedParser extracts XML markup with BodyContentHandler
Key: TIKA-1500
URL: https://issues.apache.org/jira/browse/TIKA-1500
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reinhard Schwab updated TIKA-1500:
--
Attachment: TIKA-1500.patch
Patch, which contains the trivial fix.
FeedParser extracts XML
[
https://issues.apache.org/jira/browse/TIKA-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reinhard Schwab updated TIKA-548:
-
Attachment: test.pdf
this is a sample pdf document to reproduce the regression.
PDF content
[
https://issues.apache.org/jira/browse/TIKA-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964592#action_12964592
]
Reinhard Schwab commented on TIKA-548:
--
i have generated this document with openoffice
[
https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925288#action_12925288
]
Reinhard Schwab commented on TIKA-539:
--
hi ken,
in other words:
it trusts the server
Encoding detection is too biased by encoding in meta tag
Key: TIKA-539
URL: https://issues.apache.org/jira/browse/TIKA-539
Project: Tika
Issue Type: Bug
Affects Versions: 0.8
[
https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reinhard Schwab updated TIKA-539:
-
Attachment: TIKA-539.patch
Encoding detection is too biased by encoding in meta tag
[
https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reinhard Schwab updated TIKA-539:
-
Attachment: TIKA-539_2.patch
ignore my first version of the patch.
the encoding detection in the