Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

Ahmet Arslan Mon, 04 Oct 2010 06:57:03 -0700

> Does anyone know how to deal with these 2 issues when using
> NGramFilterFactory for autocomplete?
> 
> 1) hyphens - if user types "ema" or "e-ma" I want to
> suggest "email"
> 
> 2) accents - if user types "herme"  want to suggest
> "Hermès"


Accents can be removed with using MappingCharFilterFactory before the 
tokenizer. (both index and query time)

<charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>

I am not sure if this is most elegant solution but you can replace - with "" 
uing MappingCharFilterFactory too. It satisfies what you describe in 1.

But generally NGramFilterFactory produces a lot of tokens. I mean query er can 
return hermes. May be EdgeNGramFilterFactory can be more suitable for 
auto-complete task. At least it guarantees that some word is starting with that 
character sequence.

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

Reply via email to