Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Alex vB
Hi Robert, the adapted codec is running but it seems to be incredible slow. Will take some time ;) Here are some performance results: Indexing scheme Index Size

Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Robert Muir
On Sat, Apr 23, 2011 at 2:06 PM, Alex vB wrote: > > I am a little bit curious about the Lucene 3.0 performance results because > the larger index seems to > work faster?!? I already ran the test several times. Are my results > realistic at all? I thought PForDelta/2 would outperform the standard i

Re: New codecs keep Freq skip/omit Pos

2011-04-23 Thread Alex vB
> it depends upon the type of query.. what queries are you using for > this benchmarking and how are you benchmarking? > FYI: for benchmarking standard query types with wikipedia you might be > interested in http://code.google.com/a/apache-extras.org/p/luceneutil/ I have 1 queries from a AOL d

ICU Chinese words

2011-04-23 Thread Weiwei Wang
hi,all I'm working on a Chinese contact search project, I need to transform the Chinese words to its Pinyin form. e.g. 中国--> zhongguo The problem I encounter is that for some chinese words which have more than one transforms, like. 贾-> jia, 贾->gu, ... I already used the ICUTransformFilter

Re: ICU Chinese words

2011-04-23 Thread Robert Muir
2011/4/23 Weiwei Wang : > hi,all >      I'm working on a Chinese contact search project, I need to transform > the Chinese words to its Pinyin form. > > e.g. >  中国--> zhongguo > > The problem I encounter is that for some chinese words which have more than > one transforms, like. 贾-> jia, 贾->gu, ...

Re: "Umlaute" getting lost

2011-04-23 Thread Grant Ingersoll
On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote: > I keep my search terms in a dedicated RAMDirectory (the termIndex). > In there I palce all the term of my real index. When putting the terms into > the > termIndex I can still see [using the debugger] the Umlaute (äöü). > Unfortunately when s