Re: Problem loading large pdf files

Brent Pathakis Wed, 30 Oct 2013 11:38:30 -0700

Disregard the last message.

I was using RandomAccess and RandomAccesFile from java.io*


*Brent Pathakis*
801 536 0041


On Wed, Oct 30, 2013 at 12:02 PM, Brent Pathakis <[email protected]> wrote:

> I tried this:
>
> RandomAccess scratchFile=null;
>
>
> if (tmpFile.exists()){
>  tmpFile.delete();
>   scratchFile = new RandomAccessFile(tmpFile, "rw");
>  }else
> {
>  scratchFile =  new RandomAccessFile(tmpFile, "rw");
>  }
>
> But eclipse tells me:
>
> Type mismatch: cannot convert from RandomAccessFile to RandomAccess
>
> If I try this:
>
> if (tmpFile.exists()){ tmpFile.delete(); RandomAccess scratchFile = (
> RandomAccess) new RandomAccessFile(tmpFile, "rw"); }else { RandomAccess
> scratchFile = (RandomAccess) new RandomAccessFile(tmpFile, "rw"); }
>
>
> Then I this error at run time:
>
> java.lang.ClassCastException: java.io.RandomAccessFile cannot be cast to
> org.apache.pdfbox.io.RandomAccess at PDFRedact.main(PDFRedact.java:34)
>
>
> *Brent Pathakis*
> 801 536 0041
>
>
> On Wed, Oct 30, 2013 at 10:55 AM, Gilad Denneboom <
> [email protected]> wrote:
>
>> I used this code in one occasion:
>>
>>         String tmpFilePath =
>> System.getProperty("java.io.tmpdir")+File.separator+"scratch.tmp";
>>         File tmpFile = new File(tmpFilePath);
>>         if (tmpFile.exists())
>>             tmpFile.delete();
>>         RandomAccess scratchFile = new RandomAccessFile(tmpFile, "rw");
>>
>>         PDDocument doc = PDDocument.load( filePath, scratchFile );
>>
>>
>>
>> On Wed, Oct 30, 2013 at 5:31 PM, Brent Pathakis <[email protected]>
>> wrote:
>>
>> > Thanks. Do you have an example of code using the scratch file?
>> > On Oct 30, 2013 9:30 AM, "Gilad Denneboom" <[email protected]>
>> > wrote:
>> >
>> > > Try using a scratch file in the load method of PDDocument.
>> > >
>> > >
>> > > On Wed, Oct 30, 2013 at 3:48 PM, Brent Pathakis <[email protected]>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > >   I'm trying to use PDFbox to load a large pdf document (>1gb):
>> > > > [
>> > > >                       File inputPdf = new File("c:\\some.pdf");
>> > > >    PDFTextStripper stop = new PDFTextStripper ();
>> > > >
>> > > > FileInputStream fis=null;
>> > > >  fis=new FileInputStream(inputPdf);
>> > > > pd = PDDocument.load(fis,true);[/CODE]
>> > > >
>> > > >   This code works fine for smaller pdfs, but only larger ones I'm
>> > > getting:
>> > > >
>> > > >   org.apache.pdfbox.exceptions.WrappedIOException
>> > > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159)
>> > > > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130)
>> > > > at PDFRedact.main(PDFRedact.java:19)
>> > > > Caused by: java.lang.IndexOutOfBoundsException: Index: 15625, Size:
>> > 15625
>> > > > at java.util.ArrayList.RangeCheck(Unknown Source)
>> > > > at java.util.ArrayList.get(Unknown Source)
>> > > > at
>> > >
>> org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:84)
>> > > > at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(
>> > > > RandomAccessFileOutputStream.java:106)
>> > > > at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>> > > > at java.io.BufferedOutputStream.flush(Unknown Source)
>> > > > at java.io.FilterOutputStream.close(Unknown Source)
>> > > > at
>> > org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:
>> > > > 610)
>> > > > at
>> > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:568)
>> > > > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188)
>> > > > ... 4 more
>> > > >
>> > > >
>> > > >    Any ideas or help would be appreciated.
>> > > >
>> > > > *Brent Pathakis*
>> > > > 801 536 0041
>> > > >
>> > >
>> >
>>
>
>

Re: Problem loading large pdf files

Reply via email to