[jira] [Commented] (PDFBOX-2391) Use an enum for RenderingIntent

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179624#comment-14179624 ] Tilman Hausherr commented on PDFBOX-2391: - Oops... I see I accidentally committed

Re: download link broken

2014-10-22 Thread Tilman Hausherr
Now it works. Tilman Am 21.10.2014 um 21:10 schrieb Tilman Hausherr: https://pdfbox.apache.org/download.cgi shows this: #!/bin/sh # Wrapper script around mirrors.cgi script # (we must change to that directory in order for python to pick up the # python includes correctly) cd

[jira] [Resolved] (PDFBOX-2391) Use an enum for RenderingIntent

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-2391. - Resolution: Fixed Use an enum for RenderingIntent ---

[jira] [Commented] (PDFBOX-2391) Use an enum for RenderingIntent

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179630#comment-14179630 ] John Hewson commented on PDFBOX-2391: - Yes, that's fine. Thanks. Use an enum for

Re: download link broken

2014-10-22 Thread Andreas Lehmkühler
Hi, I didn't do anything neither to break the download link nor to repair it. ;-) I guess there was some hickup somewhere in the infrastructure BR Andreas Lehmkühler Tilman Hausherr thaush...@t-online.de hat am 22. Oktober 2014 um 08:14 geschrieben: Now it works. Tilman Am

RE: 2.0

2014-10-22 Thread Andreas Lehmkühler
Hi Tim, first of all thanks for the offer, this is highly appreciated! I already have a first fix for PDFBOX-2441, but there is another issue. I hope to fix it soon. I'm just curious, do you run that comparisons manually or do you plan to implement some more or less automatic test which can be

[jira] [Resolved] (PDFBOX-2293) NonSequential parser gives an error

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-2293. - Resolution: Fixed I'm setting this one to resolved (because I made a slight change in

[jira] [Closed] (PDFBOX-1462) Use file backed buffer for FlateFilter?

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-1462. --- Resolution: Won't Fix Yes, I'm closing this - because of what you mention - because of what

[jira] [Commented] (PDFBOX-2333) Overhaul the apperance generation for PDF forms

2014-10-22 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179690#comment-14179690 ] ASF subversion and git services commented on PDFBOX-2333: - Commit

AcroForms appearance generation and 1.8

2014-10-22 Thread Maruan Sahyoun
Hi, I started making some adjustments to how the appearance is calculated for various field types for PDFBOX-2333. Although some of this could be made available to 1.8 if it doesn’t break the public API I’m not planning to do so. WDYT? BR Maruan

[jira] [Commented] (PDFBOX-1462) Use file backed buffer for FlateFilter?

2014-10-22 Thread Edoardo Causarano (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179711#comment-14179711 ] Edoardo Causarano commented on PDFBOX-1462: --- Hi Tilman, seriously I opened

[jira] [Commented] (PDFBOX-1907) Out of memory - COSDocument (RandomAccessBuffer)

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179752#comment-14179752 ] Michael Goddard commented on PDFBOX-1907: - On a project using Apache Tika 1.6, we

Re: AcroForms appearance generation and 1.8

2014-10-22 Thread Andreas Lehmkühler
Hi, Maruan Sahyoun sahy...@fileaffairs.de hat am 22. Oktober 2014 um 10:03 geschrieben: Hi, I started making some adjustments to how the appearance is calculated for various field types for PDFBOX-2333. Although some of this could be made available to 1.8 if it doesn’t break the public

[jira] [Commented] (PDFBOX-1907) Out of memory - COSDocument (RandomAccessBuffer)

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179767#comment-14179767 ] Michael Goddard commented on PDFBOX-1907: - I couldn't find the upload button

[jira] [Created] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Maruan Sahyoun (JIRA)
Maruan Sahyoun created PDFBOX-2445: -- Summary: Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf Key: PDFBOX-2445 URL: https://issues.apache.org/jira/browse/PDFBOX-2445 Project: PDFBox

[jira] [Commented] (PDFBOX-1907) Out of memory - COSDocument (RandomAccessBuffer)

2014-10-22 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179772#comment-14179772 ] Maruan Sahyoun commented on PDFBOX-1907: [~mgoddard] I’ve created a separate

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179774#comment-14179774 ] Michael Goddard commented on PDFBOX-2445: - On a project using Apache Tika 1.6, we

[jira] [Commented] (PDFBOX-1907) Out of memory - COSDocument (RandomAccessBuffer)

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179775#comment-14179775 ] Michael Goddard commented on PDFBOX-1907: - Sure thing. Thanks -- just added the

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179784#comment-14179784 ] Maruan Sahyoun commented on PDFBOX-2445: The file uses a lot of small images

[jira] [Updated] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maruan Sahyoun updated PDFBOX-2445: --- Component/s: PDModel Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179878#comment-14179878 ] Michael Goddard commented on PDFBOX-2445: - BTW, here's my app's stack trace:

[jira] [Commented] (PDFBOX-2409) got the wrong result from Arabic text extraction

2014-10-22 Thread EugenePig (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179883#comment-14179883 ] EugenePig commented on PDFBOX-2409: --- I am sure I run ExtractText with “-sort -encoding

[jira] [Updated] (PDFBOX-2409) got the wrong result from Arabic text extraction

2014-10-22 Thread EugenePig (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] EugenePig updated PDFBOX-2409: -- Attachment: THESSALONIANS.txt.mac.jpg It was captured on the Mac. got the wrong result from Arabic

[jira] [Issue Comment Deleted] (PDFBOX-2409) got the wrong result from Arabic text extraction

2014-10-22 Thread EugenePig (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] EugenePig updated PDFBOX-2409: -- Comment: was deleted (was: It was captured on the Mac.) got the wrong result from Arabic text

[jira] [Created] (PDFBOX-2446) Create Validator for PDF/A-2b

2014-10-22 Thread Ralf Hauser (JIRA)
Ralf Hauser created PDFBOX-2446: --- Summary: Create Validator for PDF/A-2b Key: PDFBOX-2446 URL: https://issues.apache.org/jira/browse/PDFBOX-2446 Project: PDFBox Issue Type: New Feature

[jira] [Updated] (PDFBOX-2446) Create Validator for PDF/A-2b

2014-10-22 Thread Ralf Hauser (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ralf Hauser updated PDFBOX-2446: Attachment: PDF-A2b_testCasesKOSTgplV3.zip Attached some GPLv3 licensed test pdfs Create

[jira] [Commented] (PDFBOX-1462) Use file backed buffer for FlateFilter?

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179971#comment-14179971 ] Tilman Hausherr commented on PDFBOX-1462: - Hello [~ecausarano], This wasn't meant

[jira] [Commented] (PDFBOX-2441) Improve XRef self healing mechanism when more than one xref table

2014-10-22 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180185#comment-14180185 ] ASF subversion and git services commented on PDFBOX-2441: - Commit

[jira] [Commented] (PDFBOX-2441) Improve XRef self healing mechanism when more than one xref table

2014-10-22 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180199#comment-14180199 ] Andreas Lehmkühler commented on PDFBOX-2441: The simple algorithm wasn't the

[jira] [Commented] (PDFBOX-2441) Improve XRef self healing mechanism when more than one xref table

2014-10-22 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180208#comment-14180208 ] ASF subversion and git services commented on PDFBOX-2441: - Commit

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread JIRA
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180237#comment-14180237 ] Andreas Lehmkühler commented on PDFBOX-2445: I can't confirm the issue.

[jira] [Commented] (PDFBOX-2441) Improve XRef self healing mechanism when more than one xref table

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180256#comment-14180256 ] Tilman Hausherr commented on PDFBOX-2441: - {code} I got a DataFormatException

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180329#comment-14180329 ] Michael Goddard commented on PDFBOX-2445: - Here's where I ran the PDFBox app JAR

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180361#comment-14180361 ] Tilman Hausherr commented on PDFBOX-2445: - I can comfirm what [~lehmi] wrote - It

[jira] [Updated] (PDFBOX-2409) got the wrong result from Arabic text extraction

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2409: Attachment: jahewson.mac.png Here's a screenshot from my Mac of the .txt file which you uploaded.

[jira] [Commented] (PDFBOX-2446) Create Validator for PDF/A-2b

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180383#comment-14180383 ] John Hewson commented on PDFBOX-2446: - It would, but is there anybody who will

[jira] [Comment Edited] (PDFBOX-2446) Create Validator for PDF/A-2b

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180383#comment-14180383 ] John Hewson edited comment on PDFBOX-2446 at 10/22/14 7:27 PM:

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180386#comment-14180386 ] John Hewson commented on PDFBOX-2445: - {quote} John Hewson couldn’t we probably

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180390#comment-14180390 ] John Hewson commented on PDFBOX-2445: - {quote} images are decoded even when only text

[jira] [Updated] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2445: Affects Version/s: (was: 2.0.0) Out of Memory - Extract text for

[jira] [Updated] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2445: Component/s: (was: Parsing) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180392#comment-14180392 ] John Hewson commented on PDFBOX-2445: - {quote} I couldn't find the upload button

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180391#comment-14180391 ] Michael Goddard commented on PDFBOX-2445: - Hi Guys, are you saying that if I use

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180400#comment-14180400 ] Michael Goddard commented on PDFBOX-2445: - Thanks. Just to clarify, when I

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Maruan Sahyoun (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180454#comment-14180454 ] Maruan Sahyoun commented on PDFBOX-2445: Seems to be dependent on the VM. I tried

[jira] [Updated] (PDFBOX-2421) Poor text extraction and rendering of file with non embedded type1 font

2014-10-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2421: Component/s: FontBox Poor text extraction and rendering of file with non embedded type1

[jira] [Assigned] (PDFBOX-2421) Poor text extraction and rendering of file with non embedded type1 font

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-2421: --- Assignee: John Hewson Poor text extraction and rendering of file with non embedded type1

[jira] [Updated] (PDFBOX-2421) Poor text extraction and rendering of file with non embedded type1 font

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2421: Component/s: (was: Text extraction) Poor text extraction and rendering of file with non

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread John Hewson (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180765#comment-14180765 ] John Hewson commented on PDFBOX-2445: - [~mgoddard], what's your {{java -version}}?

[jira] [Commented] (PDFBOX-2445) Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf

2014-10-22 Thread Michael Goddard (JIRA)
[ https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180777#comment-14180777 ] Michael Goddard commented on PDFBOX-2445: - Yep, I'm running the 64 bit JVM. Now