ED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Wednesday, August 14, 2002 9:46 AM
> Subject: problems with HTML Parser
>
>
> > Has anyone noticed that the HTML Parser that comes with
> > Lucene joins terms together when parsing a file.
>
Maurits,
You can get a PDF parser from http://www.pdfbox.org
-Ben
On Wed, 14 Aug 2002, Maurits van Wijland wrote:
> Keith,
>
> I haven't noticed the problem with the Parser...but you trigger me
> by saying that you have a PDFParser!!!
>
> Are you able to contribute this PDFParser??
>
> Maurit
[EMAIL PROTECTED]>
Sent: Wednesday, August 14, 2002 9:46 AM
Subject: problems with HTML Parser
> Has anyone noticed that the HTML Parser that comes with
> Lucene joins terms together when parsing a file.
> I used to think it was my PDFParser but after fixing that
> I found o
Has anyone noticed that the HTML Parser that comes with
Lucene joins terms together when parsing a file.
I used to think it was my PDFParser but after fixing that
I found out it was the HMTLParser.
I managed to find a replacement parser that doesn't join terms.
Just wondered if anyone had come a