Indexing and searching with StandardAnalyzer

2006-05-08 Thread Bob Cheung
Using StandardAnalyzer, I was able to index a document containing the string "co_cc" (without quotes) but I couldn't search for it. Using Luke, I was able to see "co_cc" was indexed. Using Luke to search, I was not able to find any hit using StandardAnalyzer. However, if I use KeywordAnalyzer to

The best Chinese Analyzer?

2006-05-07 Thread Bob Cheung
I have a question for those who have used Lucene to index and search for Chinese Characters, what is the best Analyzer for the job? I know all these three can do the job: 1. StandardAnalyzer 2. CJKAnalyzer 3. ChineseAnalyzer What are the difference between these 3 analyzers? TIA. Regards, Bob

RE: Sorting in Lucene

2006-03-13 Thread Bob Cheung
Tuesday, March 14, 2006 11:04 AM To: java-user@lucene.apache.org Subject: Re: Sorting in Lucene On 3/13/06, Bob Cheung <[EMAIL PROTECTED]> wrote: > I am curious why the character "/" sorts before the space. > > For example, > > Apple/banana is good for you. > &

Sorting in Lucene

2006-03-13 Thread Bob Cheung
I am curious why the character "/" sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Is there something I can do to make it sort correctly? Regards, Bob - To uns

RE: Indexing multiple languages

2005-06-02 Thread Bob Cheung
Hi Erik, I am a new comer to this list and please allow me to ask a dumb question. For the StandardAnalyzer, will it have to be modified to accept different character encodings. We have customers in China, Taiwan and Hong Kong. Chinese data may come in 3 different encoding: Big5, GB and UTF8.