://lucene-consulting.com/
- Original Message
From: Xuesong Luo [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, June 23, 2007 11:48:55 PM
Subject: RE: Multi-language Tokenizers / Filters recommended?
For chinese search, you may also consider
2:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi-language Tokenizers / Filters recommended?
Hi Daniel,
As you know, Chinese and Japanese does not use
space or any other delimiters to break words.
To overcome this problem, CJKTokenizer uses a method
called bi-gram where the run
Hi Daniel,
As you know, Chinese and Japanese does not use
space or any other delimiters to break words.
To overcome this problem, CJKTokenizer uses a method
called bi-gram where the run of ideographic (=Chinese)
characters are made into tokens of two neighboring
characters. So a run of five