Thank you both for your answers.
I tried to find some French homophone words (tache / tâche, bouche /
bouché, ...) with different stems (with snowball, minimal and light
stemmers), but without success. So put the ASCIIFolding filter before the
stemmer is not a big issue (in French) for precision.
Hi,
I have experimented before, and found that Snowball is sensitive to
accents/diacritics.
Please see for more details:
http://www.sciencedirect.com/science/article/pii/S0306457315001053
Ahmet
On Friday, February 10, 2017 11:27 AM, Dominique Bejean
wrote:
Hi,
Is the SnowballPorterFilter
The easiest way to answer that is to define two different fieldTypes,
one with Snowball first and one with ASCIIFolding first, fire up the
admin/analysis page and give it some input. That'll show you _exactly_
what transformations take place at each step.
Best,
Erick
On Fri, Feb 10, 2017 at 12:26
Hi,
Is the SnowballPorterFilter sensitive to the accents for French for
instance ?
If I use both SnowballPorterFilter and ASCIIFoldingFilter, do I have to
configure ASCIIFoldingFilter after SnowballPorterFilter ?
Regards.
Dominique
--
Dominique Béjean
06 08 46 12 43
Or if you know that you'll always strip accents in your search you may
pre-process your pt_PT.dic to remove accents from it and use that custom
dictionary instead in Solr.
Another alternative could be to extend HunSpellFilter so that it can take in
the class name of a TokenFilter class to apply
Hi Bráulio,
I don't know about HunspellStemFilterFactory especially but concerning
accents:
There are several accent filter that will remove accents from your
tokens. If the Hunspell filter factory requires the accents, then simply
add the accent filters after Hunspell in your index and query fil
Hello all,
I'm evaluating the HunspellStemFilterFactory I found it works with a
pt_PT dictionary.
For example, if I search for 'fóruns' it stems it to 'fórum' and then find
'fórum' references.
But if I search for 'foruns' (without accent),
then HunspellStemFilterFactory cannot stem
word, as it d