-Original Message-
> From: Tanguy Moal [mailto:tanguy.m...@gmail.com]
> Sent: Wednesday, May 16, 2012 8:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: FrenchLightStemFilterFactory : normalizing tokens longer than
> 4 characters and having repeated characters in it
>
>
ts the
operation to chars 'k', 'p', and 't'.)
Thanks,
Steve
-Original Message-
From: Tanguy Moal [mailto:tanguy.m...@gmail.com]
Sent: Wednesday, May 16, 2012 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: FrenchLightStemFilterFactory : normalizing to
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal wrote:
> Any idea someone ?
>
> I think this is important since this could produce weird results on
> collections with numbers mixed in text.
I agree, i think we should just add '&& Character.isLetter(ch)' to the
undoublet check?
Thanks for bringing t
Any idea someone ?
I think this is important since this could produce weird results on
collections with numbers mixed in text.
>From my understanding, there are a few options to address the issue :
1) Make *LightStemmer token type aware and don't try to stem on things that
are not text (alpha/alp