Re: Stemming and accents

2017-02-11 Thread Dominique Bejean
Thank you both for your answers. I tried to find some French homophone words (tache / tâche, bouche / bouché, ...) with different stems (with snowball, minimal and light stemmers), but without success. So put the ASCIIFolding filter before the stemmer is not a big issue (in French) for precision.

Re: Stemming and accents

2017-02-10 Thread Ahmet Arslan
Hi, I have experimented before, and found that Snowball is sensitive to accents/diacritics. Please see for more details: http://www.sciencedirect.com/science/article/pii/S0306457315001053 Ahmet On Friday, February 10, 2017 11:27 AM, Dominique Bejean wrote: Hi, Is the SnowballPorterFilter

Re: Stemming and accents

2017-02-10 Thread Erick Erickson
The easiest way to answer that is to define two different fieldTypes, one with Snowball first and one with ASCIIFolding first, fire up the admin/analysis page and give it some input. That'll show you _exactly_ what transformations take place at each step. Best, Erick On Fri, Feb 10, 2017 at 12:26

Stemming and accents

2017-02-10 Thread Dominique Bejean
Hi, Is the SnowballPorterFilter sensitive to the accents for French for instance ? If I use both SnowballPorterFilter and ASCIIFoldingFilter, do I have to configure ASCIIFoldingFilter after SnowballPorterFilter ? Regards. Dominique -- Dominique Béjean 06 08 46 12 43

Re: Stemming and accents (HunspellStemFilterFactory)

2012-02-15 Thread Jan Høydahl
Or if you know that you'll always strip accents in your search you may pre-process your pt_PT.dic to remove accents from it and use that custom dictionary instead in Solr. Another alternative could be to extend HunSpellFilter so that it can take in the class name of a TokenFilter class to apply

Re: Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Chantal Ackermann
Hi Bráulio, I don't know about HunspellStemFilterFactory especially but concerning accents: There are several accent filter that will remove accents from your tokens. If the Hunspell filter factory requires the accents, then simply add the accent filters after Hunspell in your index and query fil

Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Bráulio Bhavamitra
Hello all, I'm evaluating the HunspellStemFilterFactory I found it works with a pt_PT dictionary. For example, if I search for 'fóruns' it stems it to 'fórum' and then find 'fórum' references. But if I search for 'foruns' (without accent), then HunspellStemFilterFactory cannot stem word, as it d