Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? And which technique was used for validation of stop

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread karl wettin
10 maj 2007 kl. 20.39 skrev Lukas Vlcek: Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? The id

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Doron Cohen
See also en.wikipedia.org/wiki/Stop_words and www.ranks.nl/tools/stopwords.html karl wettin <[EMAIL PROTECTED]> wrote on 10/05/2007 13:57:33: > > 10 maj 2007 kl. 20.39 skrev Lukas Vlcek: > > > Can anybody point me to some references how to create an ideal set > > of stop > > words? I konw that

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Grant Ingersoll
Also, from the empirical side, have a look at Luke (after indexing w/ o any stopwords, or just the standard ones) and see what the most common terms are and see if they are meaningful or not in the context of your application. -Grant On May 10, 2007, at 7:41 PM, Doron Cohen wrote: See al

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Otis Gospodnetic
AIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, May 10, 2007 2:39:35 PM Subject: Stop words (how to create ideal set of stop words?) Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but

Re: Stop words (how to create ideal set of stop words?)

2007-05-10 Thread Lukas Vlcek
PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, May 10, 2007 2:39:35 PM Subject: Stop words (how to create ideal set of stop words?) Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do L

Re: Stop words (how to create ideal set of stop words?)

2007-05-11 Thread Grant Ingersoll
sage From: Lukas Vlcek <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, May 10, 2007 2:39:35 PM Subject: Stop words (how to create ideal set of stop words?) Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoret

Re: Stop words (how to create ideal set of stop words?)

2007-05-11 Thread mark harwood
a "Zipf visualisation" plug-in for Luke which may help. I can post the code somewhere if this is useful. Mark - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 11 May, 2007 12:14:12 PM Subject: Re: Stop words (h