Re: [jira] [Comment Edited] (TIKA-1121) Socket server text parsing error on large text files

2013-12-11 Thread Raymond Wiker
I've been reading through some of the emails referenced, and it looks like the problem might be in the code on the client side. In one of the emails from May 2013, the client-side code tries to write the entire file to Tika, and then to read the extracted text back. I had a similar problem with

[jira] [Comment Edited] (TIKA-1121) Socket server text parsing error on large text files

2013-12-11 Thread Mane (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845571#comment-13845571 ] Mane edited comment on TIKA-1121 at 12/11/13 5:43 PM: -- Also worth to m

[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files

2013-12-11 Thread Mane (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845569#comment-13845569 ] Mane commented on TIKA-1121: I've tested with tika jax-rs server, it works with cases where it

[jira] [Commented] (TIKA-1121) Socket server text parsing error on large text files

2013-12-11 Thread Mane (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845571#comment-13845571 ] Mane commented on TIKA-1121: Also worth to mention, I've this gibberish.txt file that have garb

[jira] [Commented] (TIKA-1205) Allow PDFParser to fallback to other parser if there is an exception

2013-12-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845429#comment-13845429 ] Tim Allison commented on TIKA-1205: --- Thank you for your feedback! TIKA-456 is the existi

[jira] [Commented] (TIKA-1205) Allow PDFParser to fallback to other parser if there is an exception

2013-12-11 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845398#comment-13845398 ] Hong-Thai Nguyen commented on TIKA-1205: Just a (newbie) question, why limit only o

[jira] [Created] (TIKA-1205) Allow PDFParser to fallback to other parser if there is an exception

2013-12-11 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1205: - Summary: Allow PDFParser to fallback to other parser if there is an exception Key: TIKA-1205 URL: https://issues.apache.org/jira/browse/TIKA-1205 Project: Tika Is

[jira] [Updated] (TIKA-1205) Allow PDFParser to fallback to other parser if there is an exception

2013-12-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1205: -- Description: With TIKA-1201, there is now an option to use PDFBox's NonSequentialPDFParser instead of t

Tika 715 (invalid xhtml output)

2013-12-11 Thread Raymond Wiker
Ref: https://issues.apache.org/jira/browse/TIKA-715 I'm using Tika-app-1.4 (in server-mode) in a stand-alone document processing pipeline, and have discovered that a lot of the xhtml from Tika is invalid. Subsequently, I found Tika-715, which appears to cover exactly this. Because of this issue,