Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Mark Diggory
; > report this more clearly. > > > > > > But aside from that, my limited tests seem to work quite well. > > > > > > G > > > > > > -Original Message- > > > From: Larry Stone [mailto:l...@mit.edu] > > > Sent: 08 Ap

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Larry Stone
a pity that you don't get a simple ClassNotFoundException to be able to > > report this more clearly. > > > > But aside from that, my limited tests seem to work quite well. > > > > G > > > > -Original Message----- > > From: Larry Stone [mail

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Mark Diggory
to be able to > report this more clearly. > > But aside from that, my limited tests seem to work quite well. > > G > > -Original Message- > From: Larry Stone [mailto:l...@mit.edu] > Sent: 08 April 2009 22:21 > To: Tim Donohue > Cc: DSpace Tech; Jeffrey Trimb

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Graham Triggs
; Jeffrey Trimble Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media The PDFBox library is _always_ going to be a problem because of its architecture. It insists on reading the entire PDF document, images included, into memory. This is not necessary, PDF was explicitly designed to

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
NAL bundle. You can then run index-all and your previously unfilterable/non-searchable document will be full-text searchable! Sue From: Jeffrey Trimble [mailto:jtrim...@cc.ysu.edu] Sent: Wednesday, April 08, 2009 9:36 AM To: DSpace Tech Subject: [Dspace-tech] Java

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-09 Thread Thornton, Susan M. (LARC-B702)[RAYTHEON TECHNICAL SERVICES COMPANY]
DSpace. Sue -Original Message- From: Tim Donohue [mailto:tdono...@illinois.edu] Sent: Wednesday, April 08, 2009 10:37 AM To: Jeffrey Trimble Cc: DSpace Tech Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media Jeffrey, I've seen this same issue all to many tim

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Mark Diggory
2009 5:46 AM To: Richard Rodgers Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media Got my vote for that. Until the PDFBox is perfected to really delve into this, it will be helpful. How does the PDFBox stay ahead of the game when Adobe is upgra

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Van Ly
g/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51502 -- Van Ly -Original Message- From: Jeffrey Trimble [mailto:jtrim...@cc.ysu.edu] Sent: Thu 4/9/2009 5:46 AM To: Richard Rodgers Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Larry Stone
The PDFBox library is _always_ going to be a problem because of its architecture. It insists on reading the entire PDF document, images included, into memory. This is not necessary, PDF was explicitly designed to let renderers process a page at a time in limited memory. Perhaps it could gain a lo

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Jeffrey Trimble
Got my vote for that. Until the PDFBox is perfected to really delve into this, it will be helpful. How does the PDFBox stay ahead of the game when Adobe is upgrading and adding features to the PDF standard? That should be catching up with all of us sooner or later. --Jeff Jeffrey Trimble

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Tim Donohue
Dorothea Salo wrote: > On Wed, Apr 8, 2009 at 10:53 AM, Richard Rodgers wrote: >> At MIT we came up with a similar approach, which takes some of the >> grunt work out of managing the skips. We extended MediaFilter to detect >> PDFBox >> (or other) exceptions, then automatically record their han

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Dorothea Salo
On Wed, Apr 8, 2009 at 10:53 AM, Richard Rodgers wrote: > At MIT we came up with a similar approach, which takes some of the > grunt work out of managing the skips. We extended MediaFilter to detect PDFBox > (or other) exceptions, then automatically record their handles to a skip list, > which is

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Richard Rodgers
At MIT we came up with a similar approach, which takes some of the grunt work out of managing the skips. We extended MediaFilter to detect PDFBox (or other) exceptions, then automatically record their handles to a skip list, which is used for any subsequent runs. We'd be glad to give you the code o

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Jeffrey Trimble
Fantastic. This is the first lucid comment I've heard on this subject. It's this program that seems to the bane of my existence. I like the - s flag idea. I will definitely look at implementing that. Thanks in advance, Jeffrey Trimble System LIbrarian William F. Maag Library Youngstown St

Re: [Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Tim Donohue
Jeffrey, I've seen this same issue all to many times to count. From what I've noticed it seems that the PDFBox software (which DSpace uses) occasionally has difficulties with larger PDFs (usually 7MB or larger) which included OCRed, scanned images. I've never encountered this problem with P

[Dspace-tech] Java Heap dumps during Filter-Media

2009-04-08 Thread Jeffrey Trimble
I've run into a funky situation. After using the distributed PDFBOXand the associated jars (bouncy castle) the filter media works really, really well, until-- We have one pdf that has caused the filter-media to produce a memory dump/ java heap dump. The errors are reports first the I