[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

2019-04-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812870#comment-16812870 ] Tim Allison commented on TIKA-2848: --- Hmmm... I'm able to extract text from both with str

Re: Tika 1.21?

2019-04-08 Thread Oleg Tikhonov
Great! +1. Thanks, Oleg On Mon, Apr 8, 2019, 21:11 Tim Allison wrote: > All, > PDFBox will be out in a few days, and POI should be out soon as > well. I _think_ I'd like to get in a first draft of "auto" mode for > OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing to run a > relea

Tika 1.21?

2019-04-08 Thread Tim Allison
All, PDFBox will be out in a few days, and POI should be out soon as well. I _think_ I'd like to get in a first draft of "auto" mode for OCR'ing PDFs (TIKA-2749), but other than that, I'd be willing to run a release of 1.21 in the next few weeks. WDYT? Best, Tim

[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

2019-04-08 Thread Tim Barrett (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812498#comment-16812498 ] Tim Barrett commented on TIKA-2848: --- used pdfbox-app-2.0.15-20190407.115658-123.jar, the

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812492#comment-16812492 ] Ken Krugler commented on TIKA-2849: --- Hi [~boris-petrov] - two things here. First, do you

[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

2019-04-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812472#comment-16812472 ] Tim Allison commented on TIKA-2848: --- Try: https://builds.apache.org/view/P/view/PDFBox/

[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

2019-04-08 Thread Tim Barrett (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812295#comment-16812295 ] Tim Barrett commented on TIKA-2848: --- I tried this with version 2.0.14 of PDFBox - same i

[jira] [Created] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-08 Thread Boris Petrov (JIRA)
Boris Petrov created TIKA-2849: -- Summary: TikaInputStream copies the input stream locally Key: TIKA-2849 URL: https://issues.apache.org/jira/browse/TIKA-2849 Project: Tika Issue Type: Bug Af