Re: problems with HTML Parser

2002-08-14 Thread Keith Gunn
ED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, August 14, 2002 9:46 AM > Subject: problems with HTML Parser > > > > Has anyone noticed that the HTML Parser that comes with > > Lucene joins terms together when parsing a file. >

Re: problems with HTML Parser

2002-08-14 Thread Ben Litchfield
Maurits, You can get a PDF parser from http://www.pdfbox.org -Ben On Wed, 14 Aug 2002, Maurits van Wijland wrote: > Keith, > > I haven't noticed the problem with the Parser...but you trigger me > by saying that you have a PDFParser!!! > > Are you able to contribute this PDFParser?? > > Maurit

Re: problems with HTML Parser

2002-08-14 Thread Maurits van Wijland
[EMAIL PROTECTED]> Sent: Wednesday, August 14, 2002 9:46 AM Subject: problems with HTML Parser > Has anyone noticed that the HTML Parser that comes with > Lucene joins terms together when parsing a file. > I used to think it was my PDFParser but after fixing that > I found o

problems with HTML Parser

2002-08-14 Thread Keith Gunn
Has anyone noticed that the HTML Parser that comes with Lucene joins terms together when parsing a file. I used to think it was my PDFParser but after fixing that I found out it was the HMTLParser. I managed to find a replacement parser that doesn't join terms. Just wondered if anyone had come a