I believe this problem has been fixed with 0.6.1. Please give it a try.
Ben Litchfield
--
On Thu, 6 Mar 2003, Eric Anderson wrote:
When it throws the exception, the indexer fails, so I cannot continue the index.
It appears that it's only related to some files, as I have been able to
Ben-
In attempting to use the PDFBox-0.6.0, I rec'd the following error when
attempting to scan a reasonably sized PDF repository.
Any thoughts?
caught a class java.io.EOFException
with message: Unexpected end of ZLIB input stream
Eric Anderson
LanRx Network Solutions
Quoting Ben
In this release I have changed how I parsed the document, which may have
introduced this bug. I have received another report of this and will have
it fixed for the next point release.
You said you tried with reasonably sized PDF repository. Did you stop
indexing at this error or did you
Ben,
I downloaded pdfbox and installed it. And I can use:
java org.pdfbox.Main PDF-file output-text-file
to convert .pdf file to string file.
Then I tried to integrate with Lucene. I modified the following codes in
IndexHTML.java:
else if(file.getPath().endsWith(.pdf)) {
Document doc
.
Have you any idea regarding the ClassCastException ?
Michael
-Ursprüngliche Nachricht-
Von: Ben Litchfield [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 6. März 2003 14:45
An: Lucene Users List
Betreff: Re: [ANN] PDFBox 0.6.0
In this release I have changed how I parsed the document
I would like to announce the next release of PDFBox. PDFBox allows for
PDF documents to be indexed using lucene through a simple interface.
Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
which will extract all text and PDF document summary properties as lucene
fields.