Thank you very much for the references Gordon! Looks like that is
exactly what I need
Maria
On 10/18/07, Gordon <[EMAIL PROTECTED]> wrote:
> Maria,
>
> It's perfectly reasonable to build a single list, sort it, and scan it for
> especially bad cases. See for example,
> http://members.unine.ch/jacques.savoy/clef/index.html for stopwords for
> several languages or check in some standard programming modules like:
> http://search.cpan.org/~fabpot/Lingua-StopWords-0.02/lib/Lingua/StopWords.pm
>
>
>
> On 10/18/07, Maria Mosolova <[EMAIL PROTECTED]> wrote:
> >
> > Thanks a lot to everyone who responded. Yes, I agree that eventually
> > we need to use separate stopword lists for different languages.
> > Unfortunately the data we are trying to index at the moment does not
> > contain any direct country/language information and we need to create
> > the first version of the index quickly. It does not look like
> > analyzing  documents to determine their languge is something which
> > could be accomplished in a very limited timeframe. Or am I wrong here
> > and there are existing analyzers one could use?
> > Maria
> >
> > On 10/18/07, Walter Underwood <[EMAIL PROTECTED]> wrote:
> > > Also "die" in German and English. --wunder
> > >
> > > On 10/18/07 4:16 AM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
> > >
> > > > One example that I'm familiar with: words "is" and "by" in English and
> > > > in Swedish. Both words are stopwords in English, but they are content
> > > > words in Swedish (ice and village, respectively). Similarly, "till" in
> > > > Swedish is a stopword (to, towards), but it's a content word in
> > English.
> > >
> > >
> >
>

Reply via email to