It would be interesting if the issue could be reproduced with PDFBox alone, i.e. just load the file (or rather, the input stream, so it seems) in the Tomcat servlet.

If it can be reproduced - would it be possible to set up a non AWS tomcat with the same problem? And if yes, what are the settings?

All this should be tested on the same java version. (Which one is being used?)

Tilman



Am 30.01.2019 um 17:13 schrieb Tim Allison:
forwarding to the correct pdfbox address... sorry for the noise...

---------- Forwarded message ---------
From: Tim Allison <[email protected]>
Date: Wed, Jan 30, 2019 at 10:29 AM
Subject: Re: Memory Errors with PDFBOX
To: <[email protected]>, Jim <[email protected]>, <[email protected]>


@PDFBox colleagues,
   Any thoughts/recommendations?

On Wed, Jan 30, 2019 at 9:43 AM Jim <[email protected]> wrote:
I have a simple Tika REST service that accepts a Base64Encoded String (which 
for testing is a PDF File in this case).

The REST service that receives the string Base64-decodes the string and passes 
it to Tika for file text extraction (from the binary PDF content after Base64 
Decode).

Locally, on an iMac with 16 GB, all this works fine. Even with a PDF that's 150 
MB!  No errors at all.

Yet, using an AWS Windows 2008 server also with 16 GB RAM (t3.xlarge), I get 
the error stack below.

I've tried upping the memory used by Tomcat (CATALINA_OPTS environment variable 
in Windows on AWS), but locally on the iMac, I don't do anything special at all 
for all to work. Both the working iMac and Windows have the same version of the 
service with Tika 1.20 libs.

Would appreciate any advice or suggestions.

Thanks very much.

ERROR STACK:

"java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:115)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:949)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:632)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:876)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:152)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:279)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:864)
at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:88)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:993)
at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:879)
at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:793)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:753)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1200)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1173)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at 
com.alias.ws.service.TextExtractionService.extractText(TextExtractionService.java:40)
at 
com.alias.ws.controllers.TextExtractionController.extractText(TextExtractionController.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:877)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:783)



Sent from ProtonMail, Swiss-based encrypted email.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to