Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

Ahmet Arslan Mon, 04 Oct 2010 06:58:56 -0700

> What TokenFilters would split "electric吉他" into
> "electric" & "吉他"?


Is it possible to write a regex to capture Chinese text? (Unicode range?)

If yes, you can use PatternReplaceFilter to transform electric吉他 into 
electric_吉他.

<filter class="solr.PatternReplaceFilter"
pattern="(latin)(chineese)" replacement="$1_$2"/>

After that WordDelimeterFilterFactory can produce two adjacent tokens.

But may be using a custom filter can be more easy.

Re: NGramFilterFactory for auto-complete that matches the middle of multi-lingual tags?

Reply via email to