from:"Che Dong"

Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?

2004-05-30 Thread Che Dong

you, Erik. Hope we can more communications on this issue with other east Asian Luaguage users. Che Dong > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >

Re: sigram?

2003-12-09 Thread Che Dong

means token Chinese/Japanese(without space for word segment in nature) word with Charactor one by one. Regards Che, Dong - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene List" <[EMAIL PROTECTED]> Sent: Tuesday, December

WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.

2003-11-30 Thread Che Dong

/lucene/queryParser/SimpleQueryParser modified from early version of Lucene :) Regards Che, Dong

Re: Multiple fields in XML

2003-11-04 Thread Che Dong

I had a solution for xml indexing(even rss): http://sourceforge.net/projects/weblucene/ Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 04, 2003 3:15 PM Subject: Multiple fields in XML > h

Re: Better way to Sort by Date

2003-10-18 Thread Che Dong

You can only get score and docID from index or you have to read content which reduce performance extremely. Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Thursday, October 16, 2

Re: Better way to Sort by Date

2003-10-14 Thread Che Dong

http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: F

Re: Better way to Sort by Date

2003-10-14 Thread Che Dong

http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java Che, Dong - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: F

Re: StandardTokenizer CJK Support

2003-09-27 Thread Che Dong

Attached with CJK sigram support: Che, Dong - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Sunday, September 28, 2003 6:53 AM Subject: Re: StandardTokenizer CJK Support > If Doug

Re: StandardTokenizer CJK Support

2003-09-27 Thread Che Dong

R:// unicode letters --- > | < #LETTER:// alphabets 136c137,141 <"\u0100"-"\u1fff", --- > "\u0100"-"\u1fff" > ] > > > | < #CJK:

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-03 Thread Che Dong

Please checkout WebLuceneHighlighter.java here: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/search/ Regards Che, Dong http://www.chedong.com - Original Message - From: "Bryan LaPlante" <[EMAIL PROTECTED]

Re: search item with '-' in it

2003-06-02 Thread Che Dong

ital stop words for StopFilter, we can specify witch kind of charactors can be tokened as "letters". Regards Che, Dong http://www.chedong.com/ - Original Message - From: "Lixin Meng" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTE

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-02 Thread Che Dong

- Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Sunday, June 01, 2003 7:29 PM Subject: Re: [VOTE] Proposed new committer for the Lucene sandbox > On Saturday, May 31, 2003, at 12:33

Re: [VOTE] Proposed new committer for the Lucene sandbox

2003-06-01 Thread Che Dong

an be added into lucene sandbox instead of release at sourceforge. Regards Che, Dong http://www.chedong.com/ - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Friday, May 30, 2003 12:1

PLAN: WebLucene -- Lucene Web interface, use XML as a lightweight protocol.

2003-02-19 Thread Che Dong

icode - GB2312 SJIS - (XML) (XML) - SJIS ISO-8859-1 / \ ISO-8859-1 Che, Dong http://www.chedong.com/tech/

Re: Question: using boost for sorting

2003-01-26 Thread Che Dong

Thank you, is it possable create a sub project to store user's implent basic lucene interface: Tokenizer, Filter and some other indexing approach. Regards Che, Dong - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Develope

Re: Analyzers for various languages

2002-12-30 Thread Che Dong

single CJK charator term) for more article on word segment for asian languages: http://www.google.com/search?q=chinese+word+segment+bigram Regards Che, Dong - Original Message - From: "Eric Isakson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, De

Re: problem with non latin characters in the query

2002-11-06 Thread Che Dong

) \ | |/ XML | indexing lucene index(unicode) | searching browser charset auto detecting / | \\ gbk big5 japanese russian(query string) Che, Dong XMLIndexer: http://nagoya.apache.org/eyebrowse/ReadMsg?listName

Re: Question: using boost for sorting

2002-10-16 Thread Che Dong

How about add sortType in IndexSearcher first? User can speciefy IndexSearcher.sortType(by score:default, by docID, by docID desc) before indexing. Che, Dong diff IndexSearcher.java ~/lucene-1.2-src/src/java/org/apache/lucene/search/IndexSearcher.java 66,81c66 < /** < * Impl

Fw: [contrib]: XMLIndexer/StringFilter

2002-09-22 Thread Che Dong

file(not tested). Regards Che, Dong Attach with README Lucene extend package Author: Che, Dong <[EMAIL PROTECTED]> $Header: /home/cvsroot/lucene_ext/README,v 1.1.1.1 2002/09/22 19:36:08 chedong Exp $ Introduction There is some source code extend to lucene p

recommend astyle as indent tool Re: coding conventions

2002-09-19 Thread Che Dong

will make other developers read code more efficiently. Che, Dong - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Friday, September 20, 2002 5:47 AM Subject: coding conv

about bigram based word segment

2002-09-12 Thread Che Dong

bigram based word segment at http://search.163.com in category search and news search(web page is powered by google). google's Chinese language analysis is provided by basistech with Dictionary based word segment. http://www.basistech.com/products/language-analysis/cma.html Che, Dong

Re: Lucene introduction in Chinese

2002-09-12 Thread Che Dong

te. Lucene strives > to be an internationalized package, and translated documentation is a > big part of internationalization. What do others think? > > Perhaps we should even add Che Dong as a Lucene committer so that he can > maintain this, as well as other Asian language supp

Re: fixed url and How to contribute code to lucene sandbox?

2002-09-07 Thread Che Dong

er.java http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01220.html Thank you I also have some advise and working on lucene structure(Document Field Index) => XML binding. If we Make a standard lucene.dtd as a default lucene input format maight be use for applacation intergrat

fixed url and How to contribute code to lucene sandbox?

2002-09-07 Thread Che Dong

http://www.chedong.com/tech/lucene.html fixed reference url with: http://jakarta.apache.org/lucene/ BTW: How to contribute code to lucene sandbox? Che, Dong - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Developers Li

Re: is it possible create another SimpleQueryParser with Google like syntax?

2002-08-26 Thread Che Dong

http://nagoya.apache.org/eyebrowse/SearchList?listId=&[EMAIL PROTECTED]&searchText=Peter+Halascy+&defaultField=sender&Search=Search Is it possible make QueryParser.jj with "and" relation by default? Che, Dong - Original Message - From: "Otis

is it possible create another SimpleQueryParser with Google like syntax?

2002-08-26 Thread Che Dong

> I mean: Parse query "aa bb" as "aa and bb" at default. > > Seem lucene took much time on complex QueryParser > after moving to apache project. is it possible create > another SimpleQueryParser with Google like syntax? &g

[contrib]: StandardTokenizer with sigram based CJK Support

2002-08-26 Thread Che Dong

ort in StandardTokenizer.jj step by step and keep > it fit for most i18n environment. > Some common app, like Jive, can use it as default > Analyser. > Use localized Analyzier for advanced usage. > > Thank you. > > Che, Dong > > diff StandardTokenizer.jj S

Customize sorting search results via sorting source before indexing

2002-08-10 Thread Che, Dong

If data source is sorted by some field before indexing and use docID instead of search score for sorting: we'll get search result sorted by some field modify IndexSearcher's HitCollector: ...about line 112 scorer.score(new HitCollector() { private float minScore = 0.0f; public fi

Lucene introduction in Chinese

2002-08-10 Thread Che, Dong

http://www.chedong.com/tech/lucene.html ÔÚÓ¦ÓÃÖÐ¼ÓÈëÈ«ÎÄ¼ìË÷¹¦ÄÜ ¡ª¡ª»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é ×÷Õß£º ³µ¶« [EMAIL PROTECTED] ×îºó¸üÐÂ£º2002-08-11 02:08:46 °æÈ¨ÉùÃ÷£º¿ÉÒÔÈÎÒâ×ªÔØ£¬×ªÔØÊ±ÇëÎñ±Ø±êÃ÷ÔÊ¼³ö´¦ºÍ×÷ÕßÐÅÏ¢ ¹Ø¼ü´Ê£ºLucene full-text search engine Chinese word segment ÕªÒª

[contrib]: CJKTokenizer for Asia language(Chinese Japanese Korean) Word Segment

2002-05-13 Thread Che Dong

igit will token: "3dmax"=>"3" "dmax"; "U2"=>"u2" * for Punc: '_' will token as a letter, '+' '#' will token as a digit * * @authorChe, Dong [EMAIL PROTECTED] * @version $Id$ */ CJKTokenizer.java C

build failed in GermanStemmer on platform with default encoding GBK

2002-03-05 Thread Che Dong

} [javac] ^ [javac] 11 errors Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

IndexOrderSearcher: sort data before indexing and use 1/docID instead of score as sort field

2002-03-05 Thread Che Dong

tested with (float) doc and (float) 1/doc and find 1/doc more similar to range of score. Che Dong beside class name, the only difference between IndexOrderSearcher.java and IndexSearch.java is IndexOrderSearcher use (float) 1/docID as score field while just use score filter results with minScore in

Hard to customize sort method in IndexSearcher via HitCollector

2002-02-28 Thread Che Dong

y,Filter,Sorter) will make lucene convenience for more applications. Regards Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

How to access cached hits in multi thread applications?

2002-02-27 Thread Che Dong

e search results hits can specify cache size and reuse in other threads. Regards Che Dong _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>

Bigram Co-occurrences will be the better way for Word Discrimination. Re: Will CJKAnalyser be release with Lucene 1.4?

Re: sigram?

WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.

Re: Multiple fields in XML

Re: Better way to Sort by Date

Re: Better way to Sort by Date

Re: Better way to Sort by Date

Re: StandardTokenizer CJK Support

Re: StandardTokenizer CJK Support

Re: [VOTE] Proposed new committer for the Lucene sandbox

Re: search item with '-' in it

Re: [VOTE] Proposed new committer for the Lucene sandbox

Re: [VOTE] Proposed new committer for the Lucene sandbox

PLAN: WebLucene -- Lucene Web interface, use XML as a lightweight protocol.

Re: Question: using boost for sorting

Re: Analyzers for various languages

Re: problem with non latin characters in the query

Re: Question: using boost for sorting

Fw: [contrib]: XMLIndexer/StringFilter

recommend astyle as indent tool Re: coding conventions

about bigram based word segment

Re: Lucene introduction in Chinese

Re: fixed url and How to contribute code to lucene sandbox?

fixed url and How to contribute code to lucene sandbox?

Re: is it possible create another SimpleQueryParser with Google like syntax?

is it possible create another SimpleQueryParser with Google like syntax?

[contrib]: StandardTokenizer with sigram based CJK Support

Customize sorting search results via sorting source before indexing

Lucene introduction in Chinese

[contrib]: CJKTokenizer for Asia language(Chinese Japanese Korean) Word Segment

build failed in GermanStemmer on platform with default encoding GBK

IndexOrderSearcher: sort data before indexing and use 1/docID instead of score as sort field

Hard to customize sort method in IndexSearcher via HitCollector

How to access cached hits in multi thread applications?

34 matches

Site Navigation

Mail list logo

Footer information