; > report this more clearly.
> > >
> > > But aside from that, my limited tests seem to work quite well.
> > >
> > > G
> > >
> > > -Original Message-
> > > From: Larry Stone [mailto:l...@mit.edu]
> > > Sent: 08 Ap
a pity that you don't get a simple ClassNotFoundException to be able to
> > report this more clearly.
> >
> > But aside from that, my limited tests seem to work quite well.
> >
> > G
> >
> > -Original Message-----
> > From: Larry Stone [mail
to be able to
> report this more clearly.
>
> But aside from that, my limited tests seem to work quite well.
>
> G
>
> -Original Message-
> From: Larry Stone [mailto:l...@mit.edu]
> Sent: 08 April 2009 22:21
> To: Tim Donohue
> Cc: DSpace Tech; Jeffrey Trimb
; Jeffrey Trimble
Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media
The PDFBox library is _always_ going to be a problem because of its
architecture. It insists on reading the entire PDF document, images included,
into memory. This is not necessary, PDF was explicitly designed to
NAL bundle. You can then run index-all and
your previously unfilterable/non-searchable document will be full-text
searchable!
Sue
From: Jeffrey Trimble [mailto:jtrim...@cc.ysu.edu]
Sent: Wednesday, April 08, 2009 9:36 AM
To: DSpace Tech
Subject: [Dspace-tech] Java
DSpace.
Sue
-Original Message-
From: Tim Donohue [mailto:tdono...@illinois.edu]
Sent: Wednesday, April 08, 2009 10:37 AM
To: Jeffrey Trimble
Cc: DSpace Tech
Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media
Jeffrey,
I've seen this same issue all to many tim
2009 5:46 AM
To: Richard Rodgers
Cc: dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media
Got my vote for that. Until the PDFBox is perfected to really delve
into
this, it will be helpful.
How does the PDFBox stay ahead of the game when Adobe is upgra
g/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51502
-- Van Ly
-Original Message-
From: Jeffrey Trimble [mailto:jtrim...@cc.ysu.edu]
Sent: Thu 4/9/2009 5:46 AM
To: Richard Rodgers
Cc: dspace-tech@lists.sourceforge.net
Subject: Re: [Dspace-tech] Java Heap dumps during Filter-Media
The PDFBox library is _always_ going to be a problem because of its
architecture. It insists on reading the entire PDF document, images
included, into memory. This is not necessary, PDF was explicitly
designed to let renderers process a page at a time in limited memory.
Perhaps it could gain a lo
Got my vote for that. Until the PDFBox is perfected to really delve
into
this, it will be helpful.
How does the PDFBox stay ahead of the game when Adobe is upgrading and
adding features to the PDF standard? That should be catching up with
all of
us sooner or later.
--Jeff
Jeffrey Trimble
Dorothea Salo wrote:
> On Wed, Apr 8, 2009 at 10:53 AM, Richard Rodgers wrote:
>> At MIT we came up with a similar approach, which takes some of the
>> grunt work out of managing the skips. We extended MediaFilter to detect
>> PDFBox
>> (or other) exceptions, then automatically record their han
On Wed, Apr 8, 2009 at 10:53 AM, Richard Rodgers wrote:
> At MIT we came up with a similar approach, which takes some of the
> grunt work out of managing the skips. We extended MediaFilter to detect PDFBox
> (or other) exceptions, then automatically record their handles to a skip list,
> which is
At MIT we came up with a similar approach, which takes some of the
grunt work out of managing the skips. We extended MediaFilter to detect PDFBox
(or other) exceptions, then automatically record their handles to a skip list,
which is used for any subsequent runs. We'd be glad to give you the code o
Fantastic. This is the first lucid comment I've heard on this subject.
It's this program that seems to the bane of my existence. I like the -
s flag idea. I will
definitely look at implementing that.
Thanks in advance,
Jeffrey Trimble
System LIbrarian
William F. Maag Library
Youngstown St
Jeffrey,
I've seen this same issue all to many times to count. From what I've
noticed it seems that the PDFBox software (which DSpace uses)
occasionally has difficulties with larger PDFs (usually 7MB or larger)
which included OCRed, scanned images. I've never encountered this
problem with P
I've run into a funky situation. After using the distributed
PDFBOXand
the associated jars (bouncy castle) the filter media works really,
really well,
until--
We have one pdf that has caused the filter-media to produce a memory
dump/
java heap dump. The errors are reports first the I
16 matches
Mail list logo