> What TokenFilters would split "electric吉他" into
> "electric" & "吉他"?

Is it possible to write a regex to capture Chinese text? (Unicode range?)

If yes, you can use PatternReplaceFilter to transform electric吉他 into 
electric_吉他.

<filter class="solr.PatternReplaceFilter"
pattern="(latin)(chineese)" replacement="$1_$2"/>

After that WordDelimeterFilterFactory can produce two adjacent tokens.

But may be using a custom filter can be more easy.



Reply via email to