RE: Out of memory error

2006-07-13 Thread Rob Staveley (Tom)
If you are using http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may need a 1G heap. If, however, you are using http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#writeText

Re: Out of memory error

2006-07-13 Thread Suba Suresh
Thanks. I am using the getText(PDDocument) method of the PDFTextStripper. I will try the other suggestion. suba suresh. Rob Staveley (Tom) wrote: If you are using http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o rg.pdfbox.pdmodel.PDDocument), you are going to get

Re: Out of memory error

2006-07-13 Thread Ben Litchfield
By 300MG I assume you mean 300MB. You can also try extracting the text outside of lucene by using a PDFBox command line app. java org.pdfbox.ExtractText pdffile you may need to increase the JRE memory like this java -Xmx512m .pdfbox.ExtractText pdffile OR java -Xmx1024m

RE: Out of memory error

2006-07-13 Thread Rob Staveley (Tom)
Let us know how you get on. There are a lot of people fighting very similar battles on this list. -Original Message- From: Suba Suresh [mailto:[EMAIL PROTECTED] Sent: 13 July 2006 15:30 To: java-user@lucene.apache.org Subject: Re: Out of memory error Thanks. I am using the getText

Re: Out of memory error

2006-07-13 Thread Suba Suresh
2006 15:30 To: java-user@lucene.apache.org Subject: Re: Out of memory error Thanks. I am using the getText(PDDocument) method of the PDFTextStripper. I will try the other suggestion. suba suresh. Rob Staveley (Tom) wrote: If you are using http://www.pdfbox.org/javadoc/org/pdfbox/util