Hi Mouthgalya,
  We fixed that NPE in https://issues.apache.org/jira/browse/TIKA-1605, and the 
fix will be available in Tika 1.9, which should be out within a week.
  As for memory issues, we worked around a memory leak in PDFBox with static 
caching of fonts for Tika 1.7 (may have been 1.8), but there may be others.  
One potential memory hog is the processing of inline images within PDFs...have 
you configured Tika to pull those out (default is to skip them)?  Other than 
that, I'd recommend dropping a note to the PDFBox users list to get help in 
diagnosing memory consumption with PDFBox.  Have you tried any memory profiling?

          Best,

                    Tim

From: Mouthgalya Ganapathy [mailto:mouthgalya.ganapa...@fitchratings.com]
Sent: Wednesday, June 03, 2015 3:25 PM
To: talli...@apache.org
Subject: Memory issues with PDF parser

Hi all,
I am trying to use Apache tika 1.8 for extracting contents from pdf. I have the 
below code for extracting it. It works well for few files. But if I read many 
files , I see out of memory exception.
I also see a Null pointer exception in the pdf parser. I think the null pointer 
exception is because of the memory exception.
Any suggestions?

Tika version:
  <dependency>
                     <groupId>org.apache.tika</groupId>
                     <artifactId>tika-server</artifactId>
                     <version>1.8</version>
        </dependency>

I am running it as a part of J2EE APP in JBoss 1.7

Code:-

//Parse the pdf content using Apache Tikka
            InputStream is = null;
            try {
              is = new BufferedInputStream(new FileInputStream(input));
              //Disable write limit.
              contenthandler = new BodyContentHandler(-1);
               metadata = new Metadata();
              pdfparser = new PDFParser();
              context = new ParseContext();
              pdfparser.parse(is, contenthandler, metadata, context);
              docBody=contenthandler.toString();
              //System.out.println(contenthandler.toString());
            }
            catch (Exception e) {
               System.out.println("Exception in updating docbody for report ==> 
" + report.getDocID());
               if(is==null)
                 System.out.println("The input stream is a null object");
               e.printStackTrace();
              logger.log(Level.SEVERE, e.getMessage(), e);
            }
            finally {
                if (is != null) is.close();
                contenthandler=null;
                metadata=null;
                pdfparser=null;
                context =null;
            }


Exception:-
I am just including the null pointer exception in the parser below.

10:53:11,696 INFO  [stdout] (Thread-11 
(HornetQ-client-global-threads-1619682129)) Exception in updating docbody for 
report ==> RPT_764268
10:53:12,218 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129)) java.lang.NullPointerException
10:53:12,219 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:158)
10:53:12,219 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.researchapi.dao.ResearchReportMDAO.updateDocBody(ResearchReportMDAO.java:881)
10:53:12,219 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.researchapi.dao.ResearchReportMDAO.loadFile_NEW(ResearchReportMDAO.java:965)
10:53:12,220 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.researchapi.dao.ResearchReportMDAO.upsert_NEW(ResearchReportMDAO.java:676)
10:53:12,220 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.research.ejb.ResearchReportManagerBean.processResearchReport(ResearchReportManagerBean.java:70)
10:53:12,221 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
10:53:12,221 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,222 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,222 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,223 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,223 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,223 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,224 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,224 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.jpa.interceptor.SBInvocationInterceptor.processInvocation(SBInvocationInterceptor.java:47)
10:53:12,225 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,225 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,226 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,226 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,227 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,227 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,228 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,228 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,229 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,229 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,229 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,230 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,230 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,231 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,231 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,231 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,232 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,232 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:32)
10:53:12,233 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,233 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,234 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,234 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,235 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,235 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,235 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,236 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,236 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72)
10:53:12,236 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.research.ejb.ResearchReportManagerBeanLocal$$$view4.processResearchReport(Unknown
 Source)
10:53:12,868 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
com.fitch.research.ejb.mdb.ResearchQueueManagerMDB.onMessage(ResearchQueueManagerMDB.java:150)
10:53:12,868 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
10:53:12,869 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
10:53:12,869 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
java.lang.reflect.Method.invoke(Method.java:597)
10:53:12,870 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
10:53:12,870 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,871 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53)
10:53:12,871 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36)
10:53:12,872 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,872 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21)
10:53:12,872 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,873 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,873 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53)
10:53:12,874 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,874 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:51)
10:53:12,874 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,875 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:202)
10:53:12,875 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.required(CMTTxInterceptor.java:306)
10:53:12,876 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:190)
10:53:12,876 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,876 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41)
10:53:12,877 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,877 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59)
10:53:12,878 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,878 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
10:53:12,878 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,879 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.interceptors.AdditionalSetupInterceptor.processInvocation(AdditionalSetupInterceptor.java:43)
10:53:12,879 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,880 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ejb3.component.messagedriven.MessageDrivenComponentDescription$5$1.processInvocation(MessageDrivenComponentDescription.java:184)
10:53:12,880 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45)
10:53:12,881 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)
10:53:12,881 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
10:53:12,882 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165)
10:53:12,883 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173)
10:53:12,883 ERROR [stderr] (Thread-11 
(HornetQ-client-global-threads-1619682129))    at 
org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288)

Thanks,
MG
Product Development Team




______________________________________________________________________
Confidentiality Notice: The information contained in this e-mail and any 
attachment(s) is confidential and for the use of the addressee(s) only. If you 
are not the intended recipient of this e-mail, do not duplicate or redistribute 
it by any means. Please delete this e-mail and any attachment(s) and notify us 
immediately. Unauthorized use, reliance, disclosure or copying of the contents 
of this e-mail and any attachment(s), or any similar action, is strictly 
prohibited. Fitch Ratings reserves the right, to the extent permitted by 
applicable law, to retain, monitor and intercept e-mail messages both to and 
from its systems.

This e-mail has been scanned by the MessageLabs Email Security System. For more 
information, please visit http://www.messagelabs.com/email.
______________________________________________________________________

Reply via email to