There are also useful stopword lists at
http://www.unine.ch/Info/clef/
best regards
René
--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!
-
To unsubsc
Ulrich Mayring wrote:
Hello,
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
What does mean ``good''? It depends on your corpus IMHO. The best way,
how one can get a ``good'' stop-list, is an analysis that's based on
idf. Thus, index you
There is already an analyzer available in the sandbox. Take a look
here: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
Sincerely,
Anthony Eden
Ulrich Mayring wrote:
Doug Cutting wrote:
Snowball stemmers are pre-packaged for use with Lucene at:
http://jakarta.apache.org/lucen
t; <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, June 06, 2003 11:36 AM
Subject: Re: Where to get stopword lists?
> Doug Cutting wrote:
> >
> > Snowball stemmers are pre-packaged for use with Lucene at:
> >
> > http://jakarta.apache.org/lu
Doug Cutting wrote:
Snowball stemmers are pre-packaged for use with Lucene at:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
These look interesting. Am I right in assuming that in order to use
these stemmers, I have to write an Analyzer and in its tokenStream
method I return a
There is a much more complete list of Englihs stop words included in
the Lucene article (the intro one) on Onjava.com.
I can't help you with German stop words.
Otis
--- Ulrich Mayring <[EMAIL PROTECTED]> wrote:
> Hello,
>
> does anyone know of good stopword lists for use with Lucene? I'm
> inte
Ulrich Mayring wrote:
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The Snowball project has good stop lists.
See:
http://snowball.tartarus.org/
http://snowball.tartarus.org/english/stop.txt
http://snowball.tartarus.org/german/stop
Hello,
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The default lists aren't very complete, for example the English list
doesn't contain words like "every", "because" or "until" and the German
list misses "dem" and "des" (definite art