Re: sigram?

2003-12-16 Thread Erik Hatcher
ab, bc, cd}. I was therefore expecting the sigram to tokenize abcd as {a, b, c, d}. What the StandardTokenizer does though is tokenize abcd as {abcd}. Note I am using ascii characters above, but the argument is meant for CJK characters. I'll switch to in the rest of this email to mean

sigram?

2003-12-16 Thread John McNally
each: abcd is tokenized as {ab, bc, cd}. I was therefore expecting the sigram to tokenize abcd as {a, b, c, d}. What the StandardTokenizer does though is tokenize abcd as {abcd}. Note I am using ascii characters above, but the argument is meant for CJK characters. I'll switch to in the re

Re: sigram?

2003-12-09 Thread Otis Gospodnetic
Does that mean that sigram and ideogram are synonymous? (c.f. http://en.wikipedia.org/wiki/Ideogram) Thanks, Otis --- Che Dong <[EMAIL PROTECTED]> wrote: > means token Chinese/Japanese(without space for word segment in > nature) word with Charactor one by one. > > Rega

Re: sigram?

2003-12-09 Thread Che Dong
means token Chinese/Japanese(without space for word segment in nature) word with Charactor one by one. Regards Che, Dong - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene List" <[EMAIL PROTECTED]> Sent: Tuesday, December

sigram?

2003-12-08 Thread Erik Hatcher
Could someone define "sigram" for me? It is used as a type of token in StandardTokenizer. I know it relates to the CJK stuff, but I'm curious about the term "sigram" and what it means, specifically in the context of the StandardTokenize

DO NOT REPLY [Bug 23466] - StandardTokenzier with CJK support(sigram)

2003-10-01 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) [EMAIL PROTECTED] changed: What|Removed |Added Status|REOPENED|RESOLVED Reso

DO NOT REPLY [Bug 23466] - StandardTokenzier with CJK support(sigram)

2003-10-01 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) [EMAIL PROTECTED] changed: What|Removed |Added Status|RESOLVED|REOPENED Reso

DO NOT REPLY [Bug 23466] - StandardTokenzier with CJK support(sigram)

2003-09-30 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Reso

DO NOT REPLY [Bug 23466] - StandardTokenzier with CJK support(sigram)

2003-09-30 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) --- Additional Comments From [EMAIL PROTECTED] 2003-09-30 16:24 --- Created an attachment (id=8397) Patch file for proposed change - To unsubscribe,

DO NOT REPLY [Bug 23466] - StandardTokenzier with CJK support(sigram)

2003-09-29 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) --- Additional Comments From [EMAIL PROTECTED] 2003-09-29 22:58 --- Ok, maybe I"m just clueless on applying patches, so enlighten me on how to use what you provided to patch my local version. It doesn'

DO NOT REPLY [Bug 23466] New: - StandardTokenzier with CJK support(sigram)

2003-09-28 Thread bugzilla
gzilla/show_bug.cgi?id=23466 StandardTokenzier with CJK support(sigram) Summary: StandardTokenzier with CJK support(sigram) Product: Lucene Version: CVS Nightly - Specify date in submission Platform: All URL: http://www.chedong.com/ OS/V

Re: [contrib]: StandardTokenizer with sigram based CJK Support

2002-08-27 Thread Doug Cutting
+1 Che Dong wrote: >>Attached StandardTokenizer.jj with Sigram Based east >>asia language support: >>tested under Windows and GNU/Linux >> >>Just treat different UnicodeBlock with different word >>segment method. >> >>Hope in the futur

[contrib]: StandardTokenizer with sigram based CJK Support

2002-08-26 Thread Che Dong
> Attached StandardTokenizer.jj with Sigram Based east > asia language support: > tested under Windows and GNU/Linux > > Just treat different UnicodeBlock with different word > segment method. > > Hope in the future released we can add more language > supp