[jira] [Commented] (TIKA-2847) OutOfMemoryError - tika1.19.1.jar

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809316#comment-16809316 ] Tim Allison commented on TIKA-2847: --- The main {{document.xml}} decompresses to ~100MB...which is not

[jira] [Created] (TIKA-2847) OutOfMemoryError - tika1.19.1.jar

2019-04-03 Thread Ashish Tiwari (JIRA)
Ashish Tiwari created TIKA-2847: --- Summary: OutOfMemoryError - tika1.19.1.jar Key: TIKA-2847 URL: https://issues.apache.org/jira/browse/TIKA-2847 Project: Tika Issue Type: Bug Affects

[jira] [Commented] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808996#comment-16808996 ] Hudson commented on TIKA-2846: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1639 (See

[jira] [Commented] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808966#comment-16808966 ] Hudson commented on TIKA-2846: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #395 (See

[jira] [Commented] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808939#comment-16808939 ] Hudson commented on TIKA-2846: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #176 (See

[jira] [Commented] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808893#comment-16808893 ] Tim Allison commented on TIKA-2846: --- Thank you, again, [~tilman]! > Add per page unicode mapping stats

[jira] [Resolved] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2846. --- Resolution: Fixed Assignee: Tim Allison Fix Version/s: 1.21 > Add per page unicode

[jira] [Updated] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2846: -- Description: As part of TIKA-2749, it would be useful to gather stats on characters that did not have

[jira] [Updated] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2846: -- Description: As part of TIKA-2749, it would be useful to gather stats on characters that did not have

[jira] [Created] (TIKA-2846) Add per page unicode mapping stats to the metadata in the PDFParser

2019-04-03 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2846: - Summary: Add per page unicode mapping stats to the metadata in the PDFParser Key: TIKA-2846 URL: https://issues.apache.org/jira/browse/TIKA-2846 Project: Tika

[jira] [Commented] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808821#comment-16808821 ] Hudson commented on TIKA-2845: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #175 (See

[jira] [Commented] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808818#comment-16808818 ] Hudson commented on TIKA-2845: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1638 (See

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808791#comment-16808791 ] Tim Allison commented on TIKA-2749: --- Thank you, [~tilman]! > OCR on PDFs should "just work" out of the

[jira] [Commented] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808787#comment-16808787 ] Hudson commented on TIKA-2845: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #394 (See

[jira] [Resolved] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2845. --- Resolution: Fixed Assignee: Tim Allison Fix Version/s: 1.21 > Override ProcessPages

[jira] [Updated] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2845: -- Description: On the PDFBox user list, [~lehmi] confirmed (and [~tilman] clarified) that

[jira] [Comment Edited] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808708#comment-16808708 ] Tim Allison edited comment on TIKA-2845 at 4/3/19 1:17 PM: --- The attached file

[jira] [Commented] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808708#comment-16808708 ] Tim Allison commented on TIKA-2845: --- The attached file opens in Adobe, has no "contents" element but

[jira] [Updated] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2845: -- Attachment: testPDFFileEmbInAnnotation_noContents.pdf > Override ProcessPages in PDFTextStripper >

[jira] [Created] (TIKA-2845) Override ProcessPages in PDFTextStripper

2019-04-03 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2845: - Summary: Override ProcessPages in PDFTextStripper Key: TIKA-2845 URL: https://issues.apache.org/jira/browse/TIKA-2845 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808400#comment-16808400 ] Tilman Hausherr commented on TIKA-2749: --- See the accepted answer here: