you, Erik. Hope we can more communications on this issue with other east Asian
Luaguage users.
Che Dong
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
means token Chinese/Japanese(without space for word segment in nature) word with
Charactor one by one.
Regards
Che, Dong
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene List" <[EMAIL PROTECTED]>
Sent: Tuesday, December
/lucene/queryParser/SimpleQueryParser
modified from early version of Lucene :)
Regards
Che, Dong
I had a solution for xml indexing(even rss):
http://sourceforge.net/projects/weblucene/
Che, Dong
- Original Message -
From: "none none" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 04, 2003 3:15 PM
Subject: Multiple fields in XML
> h
You can only get score and docID from index or you have to read content which reduce
performance extremely.
Che, Dong
- Original Message -
From: "none none" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Thursday, October 16, 2
http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java
Che, Dong
- Original Message -
From: "none none" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: F
http://cvs.sourceforge.net/viewcvs.py/weblucene/weblucene/webapp/WEB-INF/src/org/apache/lucene/search/IndexOrderSearcher.java
Che, Dong
- Original Message -
From: "none none" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: F
Attached with CJK sigram support:
Che, Dong
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Sunday, September 28, 2003 6:53 AM
Subject: Re: StandardTokenizer CJK Support
> If Doug
R:// unicode letters
---
> | < #LETTER:// alphabets
136c137,141
<"\u0100"-"\u1fff",
---
> "\u0100"-"\u1fff"
> ]
> >
> | < #CJK:
Please checkout WebLuceneHighlighter.java here:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/search/
Regards
Che, Dong
http://www.chedong.com
- Original Message -
From: "Bryan LaPlante" <[EMAIL PROTECTED]
ital stop words for StopFilter, we can
specify witch kind of charactors can be tokened as "letters".
Regards
Che, Dong
http://www.chedong.com/
- Original Message -
From: "Lixin Meng" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTE
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Sunday, June 01, 2003 7:29 PM
Subject: Re: [VOTE] Proposed new committer for the Lucene sandbox
> On Saturday, May 31, 2003, at 12:33
an be added into lucene sandbox instead of release at sourceforge.
Regards
Che, Dong
http://www.chedong.com/
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Friday, May 30, 2003 12:1
icode - GB2312
SJIS - (XML) (XML) - SJIS
ISO-8859-1 / \ ISO-8859-1
Che, Dong
http://www.chedong.com/tech/
Thank you, is it possable create a sub project to store user's implent basic lucene
interface: Tokenizer, Filter and some other indexing approach.
Regards
Che, Dong
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Develope
single CJK charator term)
for more article on word segment for asian languages:
http://www.google.com/search?q=chinese+word+segment+bigram
Regards
Che, Dong
- Original Message -
From: "Eric Isakson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, De
)
\ | |/
XML
| indexing
lucene index(unicode)
| searching
browser charset auto detecting
/ | \\
gbk big5 japanese russian(query string)
Che, Dong
XMLIndexer:
http://nagoya.apache.org/eyebrowse/ReadMsg?listName
How about add sortType in IndexSearcher first?
User can speciefy IndexSearcher.sortType(by score:default, by docID, by docID desc)
before indexing.
Che, Dong
diff IndexSearcher.java
~/lucene-1.2-src/src/java/org/apache/lucene/search/IndexSearcher.java
66,81c66
< /**
< * Impl
file(not
tested).
Regards
Che, Dong
Attach with README
Lucene extend package
Author: Che, Dong <[EMAIL PROTECTED]>
$Header: /home/cvsroot/lucene_ext/README,v 1.1.1.1 2002/09/22 19:36:08 chedong Exp $
Introduction
There is some source code extend to lucene p
will make other developers read code more
efficiently.
Che, Dong
- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Friday, September 20, 2002 5:47 AM
Subject: coding conv
bigram based word segment at http://search.163.com in category
search and news search(web page is powered by google).
google's Chinese language analysis is provided by basistech with Dictionary based word
segment.
http://www.basistech.com/products/language-analysis/cma.html
Che, Dong
te. Lucene strives
> to be an internationalized package, and translated documentation is a
> big part of internationalization. What do others think?
>
> Perhaps we should even add Che Dong as a Lucene committer so that he can
> maintain this, as well as other Asian language supp
er.java
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01220.html
Thank you
I also have some advise and working on lucene structure(Document Field Index) => XML
binding. If we Make a standard lucene.dtd as a default lucene input format maight be
use for applacation intergrat
http://www.chedong.com/tech/lucene.html
fixed reference url with:
http://jakarta.apache.org/lucene/
BTW:
How to contribute code to lucene sandbox?
Che, Dong
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Developers Li
http://nagoya.apache.org/eyebrowse/SearchList?listId=&[EMAIL PROTECTED]&searchText=Peter+Halascy+&defaultField=sender&Search=Search
Is it possible make QueryParser.jj with "and" relation by default?
Che, Dong
- Original Message -
From: "Otis
> I mean: Parse query "aa bb" as "aa and bb" at default.
>
> Seem lucene took much time on complex QueryParser
> after moving to apache project. is it possible create
> another SimpleQueryParser with Google like syntax?
&g
ort in StandardTokenizer.jj step by step and keep
> it fit for most i18n environment.
> Some common app, like Jive, can use it as default
> Analyser.
> Use localized Analyzier for advanced usage.
>
> Thank you.
>
> Che, Dong
>
> diff StandardTokenizer.jj S
If data source is sorted by some field before indexing
and use docID instead of search score for sorting:
we'll get search result sorted by some field
modify IndexSearcher's HitCollector:
...about line 112
scorer.score(new HitCollector() {
private float minScore = 0.0f;
public fi
http://www.chedong.com/tech/lucene.html
ÔÚÓ¦ÓÃÖмÓÈëÈ«ÎļìË÷¹¦ÄÜ
¡ª¡ª»ùÓÚJAVAµÄÈ«ÎÄË÷ÒýÒýÇæLucene¼ò½é
×÷Õߣº ³µ¶« [EMAIL PROTECTED]
×îºó¸üУº2002-08-11 02:08:46
°æȨÉùÃ÷£º¿ÉÒÔÈÎÒâתÔØ£¬×ªÔØʱÇëÎñ±Ø±êÃ÷Ôʼ³ö´¦ºÍ×÷ÕßÐÅÏ¢
¹Ø¼ü´Ê£ºLucene full-text search engine Chinese word
segment
ÕªÒª
igit will token: "3dmax"=>"3" "dmax"; "U2"=>"u2"
* for Punc: '_' will token as a letter, '+' '#' will token as a digit
*
* @authorChe, Dong [EMAIL PROTECTED]
* @version $Id$
*/
CJKTokenizer.java
C
}
[javac] ^
[javac] 11 errors
Che Dong
_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
tested with (float) doc and (float) 1/doc and find 1/doc more similar to
range of score.
Che Dong
beside class name, the only difference between IndexOrderSearcher.java and
IndexSearch.java is IndexOrderSearcher use
(float) 1/docID as score field while just use score filter results with
minScore in
y,Filter,Sorter) will make lucene convenience for more
applications.
Regards
Che Dong
_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
e search results hits can specify cache size and reuse in other
threads.
Regards
Che Dong
_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
34 matches
Mail list logo