PDFBox 3.0.1 renderer fails on certain files

2023-12-15 Thread John Lussmyer
I have a customer that uses a LOT of PDF files.  They currently have 2 files that are failing when we try to render them. The same files can be viewed with Acrobat Reader or Foxit PDF with no errors reported. From Acrobat Reader file info: PDF Producer: PDFOut V3.8 – build 201 – Oct 28 2022

Re: Odd OCG error

2023-11-22 Thread John Lussmyer
OSName.OCG)) {         dict.setItem(COSName.TYPE, COSName.OCG);     }     PDOptionalContentGroup grp = new PDOptionalContentGroup(dict); On 11/21/2023 10:52 PM, Andreas Lehmkühler wrote: Am 21.11.23 um 21:26 schrieb John Lussmyer: Ugh, formatting mess. For more info, this is the "addOCGs:OCG

Re: Odd OCG error

2023-11-21 Thread John Lussmyer
3 10:56 AM, John Lussmyer wrote: I'm using PDFBox 3.0.0 to combine some PDF files.  One of the files uses an Optional Content Group. Note that this code has been working just fine for many other files both with and without OCG's. For this file, I get this exception: java.lang.IllegalArgumen

Odd OCG error

2023-11-21 Thread John Lussmyer
I'm using PDFBox 3.0.0 to combine some PDF files.  One of the files uses an Optional Content Group. Note that this code has been working just fine for many other files both with and without OCG's. For this file, I get this exception: java.lang.IllegalArgumentException: Provided dictionary is

Re: PDF 2.0, PDF/A-4 support

2023-11-08 Thread John Lussmyer
On 11/8/2023 5:28 PM, Peter Wyatt wrote: I would think supporting the following PDF 2.0 features are highly relevant, given that other implementations are already generating PDF 2.0 files today (seehttps://pdfa.org/supporting-pdf20/) A bunch of useful suggestions elided.. What I REALLY

Re: Looking for a Debugger that can show which incremental save an object belongs to

2023-10-06 Thread John Lussmyer
I doubt there is a way. It's most likely that the signing code makes a MD5 checksum (or similar) of the file when it is signed. If the file is changed, checking the signing will re-calculate the checksum and find that it is different.  There isn't any info on what changed, just that SOMETHING

Re: how to replace MemoryUsageSetting.setupMixed(100mb) ?

2023-10-05 Thread John Lussmyer
ile(MemoryUsageSetting.setupMixed(100))); (I use it with tempFileOnly, but the rest are the same) On Thu, Oct 5, 2023 at 9:50 PM John Lussmyer wrote: I'm trying to update to the latest PDFBox 3.0.0. The code was using a call to loadPDF(file,MemoryUsageSetting.setupMixed(MB100); // 100 MB I see that that no lon

how to replace MemoryUsageSetting.setupMixed(100mb) ?

2023-10-05 Thread John Lussmyer
I'm trying to update to the latest PDFBox 3.0.0. The code was using a call to loadPDF(file,MemoryUsageSetting.setupMixed(MB100); // 100 MB I see that that no longer exists, but the only mention of it doesn't seem to provide any info on how to configure an equivalent replacement? Any

RE: Optional Content Groups

2023-01-04 Thread John Lussmyer
[EXTERNAL] On 04.01.2023 19:22, John Lussmyer wrote: I have a pdf with several Optional Content groups. I can find their definitions in the Page/Resources/Properties dictionary, but I don't see how they are enabled or disabled. Where is that controlled? This is below the document root, use P

Optional Content Groups

2023-01-04 Thread John Lussmyer
I have a pdf with several Optional Content groups. I can find their definitions in the Page/Resources/Properties dictionary, but I don't see how they are enabled or disabled. Where is that controlled? Confidentiality notice: This message may contain confidential information. It is intended only

Re: Possible bug with FunctionType3?

2022-06-16 Thread John Lussmyer
I was able to get ahold of the customers PDF file - but it (of course) works just FINE for me on my system. I have logs showing multiple identical failures for the customer - and lots of other files succeeding. I'd really like to test your possible fix - but first I have to figure out how to

Possible bug with FunctionType3?

2022-06-14 Thread John Lussmyer
We are using PDFBox to render various PDF files in our product. One customer is having issues due to PDFBox throwing a NullPointerException when certain files are rendered. (No, I don't have copies of the files - yet) Any ideas on what could cause this? java.lang.NullPointerException: null

Possible PDFBox bug?

2022-03-17 Thread John Lussmyer
We have an app that can generate multi-page PDF Files. We recently ran into a problem where the library we were using would keep ALL the pages in memory. For a quick workaround we have it write out single-page PDF files, then use PDFBox to combine them. We recently found a bug in the way

Problem with text extraction

2022-01-23 Thread John Lussmyer
On Sun Jan 23 10:02:08 PST 2022 rc...@pobox.com said: >I am using PDFBox's PDFTextStripper.getText() for a particular kind of >PDF file generated by a government agency, and the text I'm getting does >not match that displayed by Acrobat Reader for the same files. The >getText() calls occasionally

Re: memory requirements when merging PDF files?

2022-01-07 Thread John Lussmyer
On Fri Jan 07 08:55:38 PST 2022 ke...@trumpetinc.com said: >If you use the temporary file memory storage, it should be possible to work >with very large files. Thanks, I was hoping there was some way to deal with this case. I just ran a quick test, generating a 2000 page PDF by placing a 1 page

memory requirements when merging PDF files?

2022-01-06 Thread John Lussmyer
I have a need to merge a couple thousand PDF's into one humongous PDF. The old tool we use for PDF manipulation runs out of memory as it builds the result PDF in memory, and only writes it out when done. Can PDFBox do something more like streaming the output as it's built? or even not load all

Re: Rending text in thumbnail images

2021-09-13 Thread John Lussmyer
On Thu Sep 09 10:10:52 PDT 2021 thaush...@t-online.de said: >In theory one could make separate rendering hints for fonts and for >ordinary vectors, but that would be messy and hard to understand. (And >who knows whether it will work for your file) > >I recommend that you try doing this yourself by

Re: Rending text in thumbnail images

2021-09-09 Thread John Lussmyer
On Wed Sep 08 20:31:47 PDT 2021 thaush...@t-online.de said: >Ooops, you didn't mention that you turned antialiasing off. The image >looks as if interpolation was also turned off. If you set rendering >hints you always have to set all the hints you need. Here's the default: > >    private

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
said: >On Wed Sep 08 12:20:59 PDT 2021 thaush...@t-online.de said: >>Am 08.09.2021 um 21:16 schrieb John Lussmyer: >>> Ok, just tried that - no change. >>> >>> We are currently trying PDFBox 3.0.0-RC1 - is that a problem? >> >>No, this is excell

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
On Wed Sep 08 12:20:59 PDT 2021 thaush...@t-online.de said: >Am 08.09.2021 um 21:16 schrieb John Lussmyer: >> Ok, just tried that - no change. >> >> We are currently trying PDFBox 3.0.0-RC1 - is that a problem? > >No, this is excellent; there will be a new release of

Re: Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
Ok, just tried that - no change. We are currently trying PDFBox 3.0.0-RC1 - is that a problem? On Wed Sep 08 11:55:56 PDT 2021 thaush...@t-online.de said: >The default rendering is high quality oder speed, although there is one >obscure option you could try,

Rending text in thumbnail images

2021-09-08 Thread John Lussmyer
We are trying to switch to using PDFBox to create the thumbnail images of PDF Pages in our application. (The older product we currently use fails on OS 11). I'm running into a problem if there is text on the page, the thumbnail image makes it hard to make any sense at all of the text. (yes,

Re: Parsing huge PDF (400Mb, 2700 pages)

2019-11-14 Thread John Lussmyer
On Thu Nov 14 08:32:20 PST 2019 sahy...@fileaffairs.de said: >well - PDF ist not really easily streamable as > >- it's organized as a random access format >- the refernce table about the objects forming the PDF is at the end of the >file to you have to read the last parts first and >then move

Re: Exact PDF text - add it back as an annotation

2019-11-01 Thread John Lussmyer
On Tue Oct 29 21:59:57 PDT 2019 thaush...@t-online.de said: >IIRC tesseract can do this. Not as annotation, but as invisible font. As far as I can tell, it does it the same way that other programs do. It's added to the content stream, mixed with all the commands for positioning, font size,

Exact PDF text - add it back as an annotation

2019-10-29 Thread John Lussmyer
I have a bunch of PDF files that have had an OCR package run against them. The problem is that it adds the text to the normal Page content, and tries to position the recognized text at the location in the image it was found. So the text is mixed with lots of positioning, etc.. information. I'd

Re: PDFRendering

2016-06-27 Thread John Lussmyer
On Mon Jun 27 14:34:03 PDT 2016 j...@jahewson.com said: >Right, and if it was a leak then system.gc would not have fixed it. That is only SOMETIMES true. I've run into "memory leaks" where the leak was uncleared references to objects. So the old objects just hung around forever. -- Bobcats

Re: Call java file from PDF

2016-02-14 Thread John Lussmyer
On Sun Feb 14 12:15:12 PST 2016 bigal...@gmail.com said: >Thank you for both your answers. > >The html is very appealing, but what I did not mention is in working >within a rather rigid IT environment. > >I won't be able to install a html server. So back to Java executable (which >I can use)

Re: Call java file from PDF

2016-02-14 Thread John Lussmyer
;> Olaf >> >> > On 14.02.2016, at 21:40, Al Grant <bigal...@gmail.com> wrote: >> > >> > I would not have the permission rights to install a web server :( >> > On 15/02/2016 9:27 am, "John Lussmyer" <cou...@casadelgato.com> wro

Re: Creating a page from a block of CCITTG42D data?

2015-02-19 Thread John Lussmyer
On Wed Feb 18 23:34:09 PST 2015 thaush...@t-online.de said: Assuming you are using 1.8.8, put the ccitt stream into a PDStream object, then call the PDCcitt constructor with that PDStream. PDStream pd =new PDStream(doc, new ByteArrayInputStream(data), true); Thanks, that worked!

Re: Creating a page from a block of CCITTG42D data?

2015-02-19 Thread John Lussmyer
unique for you? I'm just wondering if I should add such code to the 2.0 or 2.1 version. Tilman Am 19.02.2015 um 19:09 schrieb John Lussmyer: On Wed Feb 18 23:34:09 PST 2015 thaush...@t-online.de said: Assuming you are using 1.8.8, put the ccitt stream into a PDStream object, then call the PDCcitt

Creating a page from a block of CCITTG42D data?

2015-02-18 Thread John Lussmyer
So, I have a block of data (byte[]) that represents a scanned image, compressed using CCITTG4. I'm new to PDFBox. (of course) So far, I haven't been able to figure out how I can create a page that consists of just that image. All the examples want to read the image from a file, and decompress