Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Cedric Ho
; Von: Cedric Ho [mailto:[EMAIL PROTECTED] > Gesendet: Samstag, 10. November 2007 02:28 > An: java-user@lucene.apache.org > Betreff: - Re: Chinese Segmentation with Phase Query > > > On Nov 10, 2007 2:08 AM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > > Hi Cedric, >

Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Uwe Goetzke
abbreviations) Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Cedric Ho [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 10. November 2007 02:28 An: java-user@lucene.apache.org Betreff: - Re: Chinese Segmentation with Phase Query On Nov 10, 2007 2:08 AM, Steven A Rowe <[EMAIL PROTECTED]>

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Cedric Ho
On Nov 10, 2007 2:08 AM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Cedric, > > On 11/08/2007, Cedric Ho wrote: > > a sentence containing characters ABC, it may be segmented into AB, C or A, > > BC. > [snip] > > In this cases we would like to index both segmentation into the index: > > > > AB o

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Cedric Ho
The CJKAnalyzer is too simple for our need. But thanks for suggesting anyway. Cheers, Cedric On Nov 9, 2007 10:43 PM, Open Study <[EMAIL PROTECTED]> wrote: > Hi Cedric > > You may try the CJKAnalyzer within the lucene sandbox. It doesn't give > a perfect solution for Chinese word segmentation, bu

RE: Chinese Segmentation with Phase Query

2007-11-09 Thread Steven A Rowe
Hi Cedric, On 11/08/2007, Cedric Ho wrote: > a sentence containing characters ABC, it may be segmented into AB, C or A, BC. [snip] > In this cases we would like to index both segmentation into the index: > > AB offset (0,1) position 0A offset (0,0) position 0 > C offset (2,2) position

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Open Study
Hi Cedric You may try the CJKAnalyzer within the lucene sandbox. It doesn't give a perfect solution for Chinese word segmentation, but will solve the problem in your case. On Nov 9, 2007 10:59 AM, Cedric Ho <[EMAIL PROTECTED]> wrote: > Hi, > > We are having an issue while indexing Chinese Documen

Chinese Segmentation with Phase Query

2007-11-08 Thread Cedric Ho
Hi, We are having an issue while indexing Chinese Documents in Lucene. Some background first: Since CJK languages doesn't have space between words, we first have to determine the words from sentences. e.g. a sentence containing characters ABC, it may be segmented into AB, C or A, BC. the proble