> Does anyone know how to deal with these 2 issues when using > NGramFilterFactory for autocomplete? > > 1) hyphens - if user types "ema" or "e-ma" I want to > suggest "email" > > 2) accents - if user types "herme" want to suggest > "Hermès"
Accents can be removed with using MappingCharFilterFactory before the tokenizer. (both index and query time) <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> I am not sure if this is most elegant solution but you can replace - with "" uing MappingCharFilterFactory too. It satisfies what you describe in 1. But generally NGramFilterFactory produces a lot of tokens. I mean query er can return hermes. May be EdgeNGramFilterFactory can be more suitable for auto-complete task. At least it guarantees that some word is starting with that character sequence.