I've been reading through some of the emails referenced, and it looks like the
problem might be in the code on the client side.
In one of the emails from May 2013, the client-side code tries to write the
entire file to Tika, and then to read the extracted text back. I had a similar
problem with
[
https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845571#comment-13845571
]
Mane edited comment on TIKA-1121 at 12/11/13 5:43 PM:
--
Also worth to m
[
https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845569#comment-13845569
]
Mane commented on TIKA-1121:
I've tested with tika jax-rs server, it works with cases where it
[
https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845571#comment-13845571
]
Mane commented on TIKA-1121:
Also worth to mention, I've this gibberish.txt file that have garb
[
https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845429#comment-13845429
]
Tim Allison commented on TIKA-1205:
---
Thank you for your feedback! TIKA-456 is the existi
[
https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845398#comment-13845398
]
Hong-Thai Nguyen commented on TIKA-1205:
Just a (newbie) question, why limit only o
Tim Allison created TIKA-1205:
-
Summary: Allow PDFParser to fallback to other parser if there is
an exception
Key: TIKA-1205
URL: https://issues.apache.org/jira/browse/TIKA-1205
Project: Tika
Is
[
https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1205:
--
Description:
With TIKA-1201, there is now an option to use PDFBox's NonSequentialPDFParser
instead of t
Ref: https://issues.apache.org/jira/browse/TIKA-715
I'm using Tika-app-1.4 (in server-mode) in a stand-alone document
processing pipeline, and have discovered that a lot of the xhtml from Tika
is invalid. Subsequently, I found Tika-715, which appears to cover exactly
this.
Because of this issue,