If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o
rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may
need a 1G heap.
If, however, you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#writeText
Thanks.
I am using the getText(PDDocument) method of the PDFTextStripper. I will
try the other suggestion.
suba suresh.
Rob Staveley (Tom) wrote:
If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o
rg.pdfbox.pdmodel.PDDocument), you are going to get
By 300MG I assume you mean 300MB.
You can also try extracting the text outside of lucene by using a
PDFBox command line app.
java org.pdfbox.ExtractText pdffile
you may need to increase the JRE memory like this
java -Xmx512m .pdfbox.ExtractText pdffile
OR
java -Xmx1024m
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-Original Message-
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 15:30
To: java-user@lucene.apache.org
Subject: Re: Out of memory error
Thanks.
I am using the getText
2006 15:30
To: java-user@lucene.apache.org
Subject: Re: Out of memory error
Thanks.
I am using the getText(PDDocument) method of the PDFTextStripper. I will try
the other suggestion.
suba suresh.
Rob Staveley (Tom) wrote:
If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util