Hi,
This can happen by having the resources in a parent of the page (you can
see this in PDFDebugger). You could get around this by using a set of
images that you have handled. From the source code of ExtractImages:
if (seen.contains(xobject.getCOSObject()))
{
// skip duplicate image
return;
}
seen.add(xobject.getCOSObject());
Tilman
Am 22.07.2025 um 15:24 schrieb Richard Kwasnicki:
Hey,
I have a PDF File with 281 pages, each page is basically just one big image.
When I load it with PDFBox, my aim is to compress the images to make them
smaller.
My approach is loading the Document, iterating over every page, checking for
all resources on it if they are of type PDImageXObject. Then i do some
compression.
The crazy thing is, my file somehow has on every page a resource to every
resource. Seems, that all the images are somehow shared... So my program now
does 281 * 281 compressions which is really slow.
Im not sure whats the best way to detect shared resources, is there some easy
way? Also if you see other approaches serving the same purposes of compressing
large images, i would be interested...
Best, Richard
Richard Kwasnicki
Softwareentwickler
Telefon: +49 351 215 908 34
E-Mail: rkwasni...@avantgarde-labs.de
Website<https://avantgarde-labs.com/> ·
LinkedIn<https://www.linkedin.com/company/avantgarde-labs-gmbh/> ·
Datenschutzbestimmungen<https://avantgarde-labs.com/de/datenschutzbestimmungen/>
Avantgarde Labs GmbH · Theresienstr. 9 · 01097 Dresden
Geschäftsführung: Robert Glaß, Torsten Hartmann, Sandy Lucka, Sven Rega
Sitz Dresden · Amtsgericht Dresden · HRB 31215 · USt-ID DE283937395
Avantgarde Labs · Wir lieben Entwicklung.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org