Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?

2004-05-30 Thread Che Dong
you, Erik. Hope we can more communications on this issue with other east Asian Luaguage users. Che Dong > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >

Re: sigram?

2003-12-09 Thread Che Dong
means token Chinese/Japanese(without space for word segment in nature) word with Charactor one by one. Regards Che, Dong - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene List" <[EMAIL PROTECTED]> Sent: Tuesday, December

WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.

2003-11-30 Thread Che Dong
/lucene/queryParser/SimpleQueryParser modified from early version of Lucene :) Regards Che, Dong

Re: Multiple fields in XML

2003-11-04 Thread Che Dong
I had a solution for xml indexing(even rss): http://sourceforge.net/projects/weblucene/ Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 04, 2003 3:15 PM Subject: Multiple fields in XML > h

Re: Better way to Sort by Date

2003-10-18 Thread Che Dong
You can only get score and docID from index or you have to read content which reduce performance extremely. Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Thursday, October 16, 2

Re: Better way to Sort by Date

2003-10-14 Thread Che Dong
http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: F

Re: Better way to Sort by Date

2003-10-14 Thread Che Dong
http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: F

Re: StandardTokenizer CJK Support

2003-09-27 Thread Che Dong
Attached with CJK sigram support: Che, Dong - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Sunday, September 28, 2003 6:53 AM Subject: Re: StandardTokenizer CJK Support > If Doug

Re: StandardTokenizer CJK Support

2003-09-27 Thread Che Dong
R:// unicode letters --- > | < #LETTER:// alphabets 136c137,141 <"\u0100"-"\u1fff", --- > "\u0100"-"\u1fff" > ] > > > | < #CJK:

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-03 Thread Che Dong
Please checkout WebLuceneHighlighter.java here: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/search/ Regards Che, Dong http://www.chedong.com - Original Message - From: "Bryan LaPlante" <[EMAIL PROTECTED]

Re: search item with '-' in it

2003-06-02 Thread Che Dong
ital stop words for StopFilter, we can specify witch kind of charactors can be tokened as "letters". Regards Che, Dong http://www.chedong.com/ - Original Message - From: "Lixin Meng" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTE

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-02 Thread Che Dong
- Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Sunday, June 01, 2003 7:29 PM Subject: Re: [VOTE] Proposed new committer for the Lucene sandbox > On Saturday, May 31, 2003, at 12:33

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-01 Thread Che Dong
an be added into lucene sandbox instead of release at sourceforge. Regards Che, Dong http://www.chedong.com/ - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Friday, May 30, 2003 12:1

PLAN: WebLucene -- Lucene Web interface, use XML as a lightweight protocol.

2003-02-19 Thread Che Dong
icode - GB2312 SJIS - (XML) (XML) - SJIS ISO-8859-1 / \ ISO-8859-1 Che, Dong http://www.chedong.com/tech/

Re: Question: using boost for sorting

2003-01-26 Thread Che Dong
Thank you, is it possable create a sub project to store user's implent basic lucene interface: Tokenizer, Filter and some other indexing approach. Regards Che, Dong - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Develope

Re: Analyzers for various languages

2002-12-30 Thread Che Dong
single CJK charator term) for more article on word segment for asian languages: http://www.google.com/search?q=chinese+word+segment+bigram Regards Che, Dong - Original Message - From: "Eric Isakson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, De

Re: problem with non latin characters in the query

2002-11-06 Thread Che Dong
) \ | |/ XML | indexing lucene index(unicode) | searching browser charset auto detecting / | \\ gbk big5 japanese russian(query string) Che, Dong XMLIndexer: http://nagoya.apache.org/eyebrowse/ReadMsg?listName

Re: Question: using boost for sorting

2002-10-16 Thread Che Dong
How about add sortType in IndexSearcher first? User can speciefy IndexSearcher.sortType(by score:default, by docID, by docID desc) before indexing. Che, Dong diff IndexSearcher.java ~/lucene-1.2-src/src/java/org/apache/lucene/search/IndexSearcher.java 66,81c66 < /** < * Impl

Fw: [contrib]: XMLIndexer/StringFilter

2002-09-22 Thread Che Dong
file(not tested). Regards Che, Dong Attach with README Lucene extend package Author: Che, Dong <[EMAIL PROTECTED]> $Header: /home/cvsroot/lucene_ext/README,v 1.1.1.1 2002/09/22 19:36:08 chedong Exp $ Introduction There is some source code extend to lucene p

recommend astyle as indent tool Re: coding conventions

2002-09-19 Thread Che Dong
will make other developers read code more efficiently. Che, Dong - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Friday, September 20, 2002 5:47 AM Subject: coding conv

about bigram based word segment

2002-09-12 Thread Che Dong
bigram based word segment at http://search.163.com in category search and news search(web page is powered by google). google's Chinese language analysis is provided by basistech with Dictionary based word segment. http://www.basistech.com/products/language-analysis/cma.html Che, Dong

Re: Lucene introduction in Chinese

2002-09-12 Thread Che Dong
te. Lucene strives > to be an internationalized package, and translated documentation is a > big part of internationalization. What do others think? > > Perhaps we should even add Che Dong as a Lucene committer so that he can > maintain this, as well as other Asian language supp

Re: fixed url and How to contribute code to lucene sandbox?

2002-09-07 Thread Che Dong
er.java http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01220.html Thank you I also have some advise and working on lucene structure(Document Field Index) => XML binding. If we Make a standard lucene.dtd as a default lucene input format maight be use for applacation intergrat

fixed url and How to contribute code to lucene sandbox?

2002-09-07 Thread Che Dong
http://www.chedong.com/tech/lucene.html fixed reference url with: http://jakarta.apache.org/lucene/ BTW: How to contribute code to lucene sandbox? Che, Dong - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Developers Li

Re: is it possible create another SimpleQueryParser with Google like syntax?

2002-08-26 Thread Che Dong
http://nagoya.apache.org/eyebrowse/SearchList?listId=&[EMAIL PROTECTED]&searchText=Peter+Halascy+&defaultField=sender&Search=Search Is it possible make QueryParser.jj with "and" relation by default? Che, Dong - Original Message - From: "Otis

is it possible create another SimpleQueryParser with Google like syntax?

2002-08-26 Thread Che Dong
> I mean: Parse query "aa bb" as "aa and bb" at default. > > Seem lucene took much time on complex QueryParser > after moving to apache project. is it possible create > another SimpleQueryParser with Google like syntax? &g

[contrib]: StandardTokenizer with sigram based CJK Support

2002-08-26 Thread Che Dong
ort in StandardTokenizer.jj step by step and keep > it fit for most i18n environment. > Some common app, like Jive, can use it as default > Analyser. > Use localized Analyzier for advanced usage. > > Thank you. > > Che, Dong > > diff StandardTokenizer.jj S

Customize sorting search results via sorting source before indexing

2002-08-10 Thread Che, Dong
If data source is sorted by some field before indexing and use docID instead of search score for sorting: we'll get search result sorted by some field modify IndexSearcher's HitCollector: ...about line 112 scorer.score(new HitCollector() { private float minScore = 0.0f; public fi

Lucene introduction in Chinese

2002-08-10 Thread Che, Dong
http://www.chedong.com/tech/lucene.html ÔÚÓ¦ÓÃÖмÓÈëÈ«ÎļìË÷¹¦ÄÜ ¡ª¡ª»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é ×÷Õߣº ³µ¶« [EMAIL PROTECTED] ×îºó¸üУº2002-08-11 02:08:46 °æȨÉùÃ÷£º¿ÉÒÔÈÎÒâתÔØ£¬×ªÔØʱÇëÎñ±Ø±êÃ÷ԭʼ³ö´¦ºÍ×÷ÕßÐÅÏ¢ ¹Ø¼ü´Ê£ºLucene full-text search engine Chinese word segment ÕªÒª

[contrib]: CJKTokenizer for Asia language(Chinese Japanese Korean) Word Segment

2002-05-13 Thread Che Dong
igit will token: "3dmax"=>"3" "dmax"; "U2"=>"u2" * for Punc: '_' will token as a letter, '+' '#' will token as a digit * * @authorChe, Dong [EMAIL PROTECTED] * @version $Id$ */ CJKTokenizer.java C

build failed in GermanStemmer on platform with default encoding GBK

2002-03-05 Thread Che Dong
} [javac] ^ [javac] 11 errors Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

IndexOrderSearcher: sort data before indexing and use 1/docID instead of score as sort field

2002-03-05 Thread Che Dong
tested with (float) doc and (float) 1/doc and find 1/doc more similar to range of score. Che Dong beside class name, the only difference between IndexOrderSearcher.java and IndexSearch.java is IndexOrderSearcher use (float) 1/docID as score field while just use score filter results with minScore in

Hard to customize sort method in IndexSearcher via HitCollector

2002-02-28 Thread Che Dong
y,Filter,Sorter) will make lucene convenience for more applications. Regards Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

How to access cached hits in multi thread applications?

2002-02-27 Thread Che Dong
e search results hits can specify cache size and reuse in other threads. Regards Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>