Hi,

Am 10.12.2012 17:02, schrieb pradeep kumar:
Hi Team,

We have been using pdfbox api for data extraction from PDF, and recently
upgraded from Pdfbox 1.7.0 to 1.7.1, on upgrade to 1.7.1 we have been facing
with high memory consumption issues,

The job we run is a spring Chunk oriented processing which process each pdf
document and saves it to the oracle database.

After running for about 30 minutes the heap grows to 3 gb and doesnt seem to
release any resources.

On analysis of heap dump we noticed the class class
org.apache.pdfbox.pdmodel.font.PDFont <mat://object/0x7035c55e8> retaining the
heap worth 3,165,708,216 bytes.

Inline image 1
Your image didn't make it due to some restrictions on the mailing list. But I saw it as I'm one of the moderators. The memory is consumed by a SynchronizedMap
most likely the cmap-cache of the PDFont class.


Our code with the same load works perfectly fine with pdfbox 1.7.0, would you be
able to confirm if this is a bug with the API so that we can revert back to
1.7.0 version.
I checked the changes and couldn't find any hints for a possible reason. I didn't run any tests yet so that I can't confirm anything. But as a workaround
you might call the PDFont#clearResources method from time to time. That should
clear the cache and reduce the memory consumption.

Regards,
Pradeep


BR
Andreas Lehmkühler

Reply via email to