Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of
similarity formula,
but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in
Query and Doc,
so it's a modified implementation of VSM in practice.
Do you just want to verify which implementation of VSM in
Hi Karl,
Where is the introduction of below algorithm? Thanks.
Very simple algorithmic solutions usually involve ranking top senstances
by looking at distribution of terms in sentances, paragraphs and the
whole document. I implemented something like this a couple of years back
that worked fairly
Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 7.1250.67 186.41 38936841 143240688
See attached for hardware info and the CPU call tree (taken from YourKit).
I would appreciate your recommendations.
Jamie
h t wrote:
Hi Michael
Hi Michael,
I guess the hotspot of lucene is
org.apache.lucene.search.IndexSearcher.search()
Hi Jamie,
What's the original text size of a million emails?
I estimate the size of an email is around 100k, is this true?
When you doing search, what kind keywords did you input, words or short
sentence?
Did you use the keywords in two calls?
2008/2/27, fangz [EMAIL PROTECTED]:
Hi,
I am using a simple java program to test the search speed. The index file
is
about 1.93G in size. I initiated an indexsearcher and built a query using
the query parser: parser.parse(entity:fail). The initial
I guess you can implement createBitSet() more effciently by using
Filer,but not BooleanQuery
2008/2/25, Gabriel Landais [EMAIL PROTECTED]:
Gabriel Landais a écrit :
How to create a Filter for a field in CollectionString?
First, split Collection in CollectionCollection with
http://www.shifttab.cn:8001/wiki
2007/10/31, Marco [EMAIL PROTECTED]:
It seems that the problem is when I add the token created by
EdgeNGramTokenizer in in the index.
If the token contains a space (for example apple com) I have to add to
the index with Field.Index.TOKENIZED otherwise the