Re: HTML Analyzer?

2002-11-14 Thread Erik Hatcher
If you have a look at the HtmlDocument class in the ant contributions directory of jakarta-lucene-sandbox in Jakarta's CVS. I wrote this and it uses JTidy to

RE: HTML Analyzer?

2002-11-14 Thread Lichty, Kent
allsc@;michaels.com] Sent: Thursday, November 14, 2002 3:18 PM To: Lucene Users List Subject: Re: HTML Analyzer? Oh wait...I just re-read your original post and apparently I misunderstood (I had just found a hammer and at a glance your problem looked like a nail). Sorry about that. But it's not e

Re: HTML Analyzer?

2002-11-14 Thread Craig Walls
do the same. I don't > quite understand how those classes would help out. Would you somehow use > them to create the Reader object that is passed to create the TokenStream > object? > > -Original Message- > From: Craig Walls [mailto:wallsc@;michaels.com]

RE: HTML Analyzer?

2002-11-14 Thread Lichty, Kent
allsc@;michaels.com] Sent: Thursday, November 14, 2002 2:39 PM To: Lucene Users List Subject: Re: HTML Analyzer? Ironically, I just had to solve this exact problem just 10 minutes ago... Check into javax.swing.text.html.HTMLEditorKit and javax.swing.text.html.HTMLDocument. Here's a URL that I fou

Re: HTML Analyzer?

2002-11-14 Thread Craig Walls
Ironically, I just had to solve this exact problem just 10 minutes ago... Check into javax.swing.text.html.HTMLEditorKit and javax.swing.text.html.HTMLDocument. Here's a URL that I found helpful (the site is Japanese, but the source code is still Java): http://java-house.jp/ml/archive/j-h-b/0377

HTML Analyzer?

2002-11-14 Thread Lichty, Kent
We have a web application that builds pages "on the fly" by reading directly from a database. The database contains both normal content and HTML. We use Lucene as our search engine, but I need to figure out how to cause it to NOT include content that is within HTML tags. I assume that this entails

Re: HTML Analyzer & filter

2002-04-16 Thread David Black
>> To: [EMAIL PROTECTED] >> Subject: HTML Analyzer & filter >> >> >> Not to seem too lazy but I was just beginning to write an HTML Filter >> and Analyzer and thought..."gee, I bet someone has done this >> already". >> Are there any Apache/GPL HT

RE: HTML Analyzer & filter

2002-04-16 Thread Halácsy Péter
> -Original Message- > From: David Black [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, April 16, 2002 5:07 PM > To: [EMAIL PROTECTED] > Subject: HTML Analyzer & filter > > > Not to seem too lazy but I was just beginning to write an HTML Filter > and Ana

HTML Analyzer & filter

2002-04-16 Thread David Black
Not to seem too lazy but I was just beginning to write an HTML Filter and Analyzer and thought..."gee, I bet someone has done this already". Are there any Apache/GPL HTML filters out there as a part of another project or that anyone on this list would be willing to contribute. Thanks -- To