Re: Getting Out of Memory Error when trying to parse and extract text of 8 MB PDF Document

VIGNESH S Tue, 05 Feb 2013 06:21:02 -0800

I think non sequential PDF Parser also loads everyobjects in Objectpool..

The diffrence I think in nonsequential is that it reads the Xref table
in trailer to know the PDF structure instead of linearly traversing
the document.


Correct me if Iam wrong.


On Sat, Feb 2, 2013 at 11:58 AM, Maruan Sahyoun <[email protected]> wrote:
> Hi,
>
> did you try the non sequentiell parser? PDDocument.loadNonSeq()?
>
> Maruan Sahyoun
>
> Am 02.02.2013 um 07:09 schrieb VIGNESH S <[email protected]>:
>
>> Hi Andreas,
>>
>> Do you have any suggestion
>>
>> On Thu, Jan 31, 2013 at 6:52 PM, Andreas Lehmkühler <[email protected]> wrote:
>>> Hi,
>>>
>>> Am 28.01.13 15:45, schrieb VIGNESH S:
>>>
>>>> Hi,
>>>>
>>>> Tried extracting Text from a 8MB PDF Document.It is taking more than
>>>> 64 MB Heap and gave out of memory when tested on android mobiles..
>>>>
>>>> What i understand is PDFBOX is loading all objects in to objectpool
>>>> initially,which increases the Heap based on the number of objects in
>>>> PDF Document which looks like DOM Way of doing things..
>>>>
>>>> Any Alternative memory Efficient SAX way of extracting text in PDFBOX.?.
>>>
>>> Try the new nonsequential parser using loadNonSeq() instead of load().
>>>
>>> BR
>>> Andreas Lehmkühler
>>
>>
>>
>> --
>> Thanks and Regards
>> Vignesh Srinivasan
>> 9739135640



-- 
Thanks and Regards
Vignesh Srinivasan
9739135640

Re: Getting Out of Memory Error when trying to parse and extract text of 8 MB PDF Document

Reply via email to