[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419998#comment-16419998 ] Tim Allison commented on TIKA-2582: --- Y, all my fault. Sorry, and thank you! > Tesseract 4.0 includes a

[jira] [Comment Edited] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419998#comment-16419998 ] Tim Allison edited comment on TIKA-2582 at 3/30/18 12:21 AM: - Y, all my fault.

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419836#comment-16419836 ] Hudson commented on TIKA-2621: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #15 (See

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419798#comment-16419798 ] Ewan Mellor commented on TIKA-2582: --- Build failures were not this change; they were from TIKA-2621 which

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419784#comment-16419784 ] Hudson commented on TIKA-2620: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1463 (See

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419781#comment-16419781 ] Hudson commented on TIKA-2582: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1463 (See

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419783#comment-16419783 ] Hudson commented on TIKA-2613: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1463 (See

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419782#comment-16419782 ] Hudson commented on TIKA-2584: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1463 (See

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419770#comment-16419770 ] Hudson commented on TIKA-2620: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #227 (See

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419752#comment-16419752 ] Ewan Mellor commented on TIKA-2613: --- Build failures were not this change; they were from TIKA-2621 which

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419682#comment-16419682 ] Hudson commented on TIKA-2621: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1462 (See

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419685#comment-16419685 ] Hudson commented on TIKA-2613: -- FAILURE: Integrated in Jenkins build tika-branch-1x #14 (See

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419683#comment-16419683 ] Hudson commented on TIKA-2582: -- FAILURE: Integrated in Jenkins build tika-branch-1x #14 (See

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419684#comment-16419684 ] Hudson commented on TIKA-2584: -- FAILURE: Integrated in Jenkins build tika-branch-1x #14 (See

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419686#comment-16419686 ] Hudson commented on TIKA-2621: -- FAILURE: Integrated in Jenkins build tika-branch-1x #14 (See

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419687#comment-16419687 ] Hudson commented on TIKA-2620: -- FAILURE: Integrated in Jenkins build tika-branch-1x #14 (See

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419670#comment-16419670 ] Hudson commented on TIKA-2613: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #226 (See

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419669#comment-16419669 ] Hudson commented on TIKA-2584: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #226 (See

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419668#comment-16419668 ] Hudson commented on TIKA-2582: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #226 (See

[jira] [Resolved] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2584. --- Resolution: Fixed Fix Version/s: 2.0.0 1.18 Thank you, [~ewanmellor-2]! >

[jira] [Resolved] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2613. --- Resolution: Fixed Fix Version/s: 2.0.0 1.18 Thank you, [~ewanmellor-2]! >

[jira] [Resolved] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2582. --- Resolution: Fixed Fix Version/s: 2.0.0 1.18 Thank you [~ewanmellor-2]! >

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419603#comment-16419603 ] Hudson commented on TIKA-2621: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #225 (See

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

2018-03-29 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419582#comment-16419582 ] ASF GitHub Bot commented on TIKA-2613: -- tballison closed pull request #230: Fix for TIKA-2613

[jira] [Commented] (TIKA-2582) Tesseract 4.0 includes a FF character by default, breaking parsers

2018-03-29 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419580#comment-16419580 ] ASF GitHub Bot commented on TIKA-2582: -- tballison closed pull request #222: Fix for TIKA-2582

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

2018-03-29 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419581#comment-16419581 ] ASF GitHub Bot commented on TIKA-2584: -- tballison closed pull request #224: Fix for TIKA-2584

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Ewan Mellor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419501#comment-16419501 ] Ewan Mellor commented on TIKA-2620: --- [https://bugs.openjdk.java.net/browse/JDK-8041125] This showed a

[jira] [Comment Edited] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487 ] Tilman Hausherr edited comment on TIKA-2620 at 3/29/18 5:53 PM:

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419487#comment-16419487 ] Tilman Hausherr commented on TIKA-2620: --- [~gagravarr] KCMS is the legacy setting. It is much faster.

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419481#comment-16419481 ] Tim Allison commented on TIKA-2621: --- I've pushed a fix to trunk. This includes avoiding double-detection

Re: tsdb extraction

2018-03-29 Thread Oleg Tikhonov
ok. time to read the spec :-) On Thu, Mar 29, 2018 at 4:02 PM, Allison, Timothy B. wrote: > Sorry...not aware of anything... > > -Original Message- > From: olegtikho...@gmail.com [mailto:olegtikho...@gmail.com] On Behalf Of > Oleg Tikhonov > Sent: Thursday, March 29,

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419278#comment-16419278 ] Tim Allison commented on TIKA-2621: --- Now I remember: https://github.com/google/brotli/issues/298 There's

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419234#comment-16419234 ] Tim Allison commented on TIKA-2621: --- bq. For now, if you add the jar, do you have success? You won't

[jira] [Comment Edited] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419222#comment-16419222 ] Tim Allison edited comment on TIKA-2621 at 3/29/18 3:44 PM: The brotli

[jira] [Commented] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419222#comment-16419222 ] Tim Allison commented on TIKA-2621: --- The brotli dependency is optional, and we don't include it...cant

[jira] [Assigned] (TIKA-2621) Brotli support

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2621: - Assignee: Tim Allison > Brotli support > -- > > Key: TIKA-2621 >

[jira] [Commented] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419134#comment-16419134 ] Tim Allison commented on TIKA-2619: --- Thanks to [~msahyoun]'s pointer, I confirmed that the dev version of

[jira] [Created] (TIKA-2622) Upgrade to PDFBox 2.0.10 when available

2018-03-29 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2622: - Summary: Upgrade to PDFBox 2.0.10 when available Key: TIKA-2622 URL: https://issues.apache.org/jira/browse/TIKA-2622 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-2621) Brotli support

2018-03-29 Thread Dawid Wolski (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wolski updated TIKA-2621: --- Description: Hi, I've got some documents [Brotli|https://en.wikipedia.org/wiki/Brotli] encoded. Tika

[jira] [Created] (TIKA-2621) Brotli support

2018-03-29 Thread Dawid Wolski (JIRA)
Dawid Wolski created TIKA-2621: -- Summary: Brotli support Key: TIKA-2621 URL: https://issues.apache.org/jira/browse/TIKA-2621 Project: Tika Issue Type: Wish Components: parser

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418968#comment-16418968 ] Tim Allison commented on TIKA-2620: --- [~tilman], any recommendations? We do have an option to render

[jira] [Commented] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418961#comment-16418961 ] Nick Burch commented on TIKA-2620: -- Do you know why Oracle haven't set that by default? If there's no

RE: tsdb extraction

2018-03-29 Thread Allison, Timothy B.
Sorry...not aware of anything... -Original Message- From: olegtikho...@gmail.com [mailto:olegtikho...@gmail.com] On Behalf Of Oleg Tikhonov Sent: Thursday, March 29, 2018 1:46 AM To: tika-...@lucene.apache.org Subject: tsdb extraction Hi guys, I am wondering if we have a parser which

[jira] [Created] (TIKA-2620) Set sys property to get better rendering speed by default

2018-03-29 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2620: - Summary: Set sys property to get better rendering speed by default Key: TIKA-2620 URL: https://issues.apache.org/jira/browse/TIKA-2620 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418928#comment-16418928 ] Tim Allison commented on TIKA-2619: --- Y, can confirm OOM with straight PDFBox 2.0.9 app's ExtractText.

[jira] [Commented] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread Andreas Meier (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418926#comment-16418926 ] Andreas Meier commented on TIKA-2619: - Can confirm this OutOfMemoryError in Version 1.17 Tried to

[jira] [Updated] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread JIRA
[ https://issues.apache.org/jira/browse/TIKA-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Dürrwanger updated TIKA-2619: --- Environment: Linux 4.13.0-37 / JDK 1.8.0_152 (was: {noformat} fd@804F9H2:~/TIKA$ uname -a

[jira] [Created] (TIKA-2619) Memory leak: PDF meta data detection fails with OutOfMemoryError

2018-03-29 Thread JIRA
Felix Dürrwanger created TIKA-2619: -- Summary: Memory leak: PDF meta data detection fails with OutOfMemoryError Key: TIKA-2619 URL: https://issues.apache.org/jira/browse/TIKA-2619 Project: Tika