What is the retrieval modle for lucene?

2006-04-10 Thread hu andy
I have seen in some documents that there are three kinds of retrieval modle which are used often: Boolean, vector space and probability. So I want to which is it that used by lucene. Thank you in advance

Re: Data structure of a Lucene Index

2006-04-10 Thread Prasenjit Mukherjee
I think Doug's paper ( specifically the Seek and Transfer section ) is the closest I could get. A little bit detailed explanation can be found in Yates' book on Information-Retreival. I agree with Dimitry, a detailed explanation (or even pointers to some existing arcticle would be beneficial t

Retrive terms with more frequence

2006-04-10 Thread pepone pepone
Hi all I interesting in know if is posible to make a query that retrives the terms in a field with more frequence in the index or better all terms in this field shorted by frequence. thanks in advance -- play tetris http://pepone.on-rez.com/tetris run gentoo http://gentoo-notes.blogspot.com/

Calculating term and document frequency for multiple word terms

2006-04-10 Thread Vishal Bathija
Hi, I was wondering how I can get the document frequency and term frequency of a phrase in a corpus. I am currently using IndexReader rd = IndexReader.open("C:\\Documents and Settings\\Owner\\My Documents\\Thesis\\luceneTest\\index"); Term t1 = new Term("contents","\"increases aesthetic\""); Ter

Re: I just don't get wildcards at all.

2006-04-10 Thread Chris Hostetter
: Let's claim that all my clauses contain wildcards. What I *think* that means : is that I can't very well use a filter "the normal way" since seachers : require a query. And I don't want a query with a wildcard term. the bueaty of ConstantScoreQuery is that it can wrap any filter ... so you can

MultiReader and MultiSearcher

2006-04-10 Thread oramas martín
Is there any performance (or other) difference between using an IndexSearcher initialized with a MultiReader instead of using a MultiSearcher? Thanks, Jose L. Oramas

Small field indexing and ranking

2006-04-10 Thread Maxym Mykhalchuk
Hi All, I've tried to search for the topic, but to no avail so far... Sorry if it's been raised before. Here's the issue: All my "documents" will be having a few (2-3: title, short description) short fields. You see, it's rare that the same word is repeated several times in a title, so will Lu

Re: MultiReader and MultiSearcher

2006-04-10 Thread Yonik Seeley
On 4/10/06, oramas martín <[EMAIL PROTECTED]> wrote: > Is there any performance (or other) difference between using an > IndexSearcher initialized with a MultiReader instead of using a > MultiSearcher? Yes, the IndexSearcher(MultiReader) solution will be faster. -Yonik http://incubator.apache.org

Re: Distributed Lucene.. - clustering as a requirement

2006-04-10 Thread Doug Cutting
Dmitry Goldenberg wrote: For an enterprise-level application, Lucene appears too file-system and too byte-sequence-centric a technology. Just my opinion. The Directory API is just too low-level. There are good reasons why Lucene is not built on top of a RDBMS. An inverted index is not effi

Re: I just don't get wildcards at all.

2006-04-10 Thread Erick Erickson
Chris: Again, many thanks. Of course only *after* you mentioned that Hits is not entirely efficient when looking at many docs did I remember TopDocs and TopFieldDocs had been mentioned. Senior moments and all that I completely missed the fact that ConstantScoreQuery doesn't take a query. I'll

Re: Calculating term and document frequency for multiple word terms

2006-04-10 Thread Erik Hatcher
Have a look at using SpanNearQuery for phrases, and walking the spans (via getSpans, I believe). Erik On Apr 10, 2006, at 12:12 PM, Vishal Bathija wrote: Hi, I was wondering how I can get the document frequency and term frequency of a phrase in a corpus. I am currently using Inde

Re: Exception in WildCardQuery

2006-04-10 Thread Erick Erickson
It took me a couple of days, but this is added to JIRA Erick

Tuning Indexing performance question ..

2006-04-10 Thread Mufaddal Khumri
Hi, I am using a multi threaded app to index a bunch of Data. The app spawns X number of threads. Each thread writes to a RAMDirectory. When thread finishes it work, the contents from the RAMDirectory are written into the FSDirectory. All threads are passed an instance of the FSWriter when th

Re: Fetch Documents Without Retrieveing All Fields

2006-04-10 Thread Bill Janssen
In case anyone else was wondering: I got curious about how one would replace FieldCache, and discovered that you can create an instance of a class which implements FieldCache, and then simply assign it to org.apache.lucene.search.FieldCache.DEFAULT. > 2) your use case sounds like it could best be

hit.doc, hit.score and FSDir performance

2006-04-10 Thread Sameer Shisodia
Hi All. I am using Lucene as the backbone of a 'Smart Search'. I have a layer over search that extensively analyzes results at runtime to bucket them. I do trim the resultset, but only after this procesing since their are non document weights that are combined with the result scores, and the hits

full text search using Lucene

2006-04-10 Thread Tony Qian
All, I'm working on a project which requires full text search on multiple tables in MySql database. Although, MySql supports full text search, it only supports full text search on signle table. I'm wondering if Lucene can help me to do full text search against MySql database. (I noticed that D

Re: full text search using Lucene

2006-04-10 Thread Chris Lu
Hi, Tony, DBSight, like SearchBlox, Nutch, Solr, is using Lcuene to search. It just makes it super easy and flexible to create a search on any databases. Lucene's implementation is far more superior and flexible to MySql's full text search. Try it and you will know what I am talking about. Actual

Re: Distributed Lucene.. - clustering as a requirement

2006-04-10 Thread Prasenjit Mukherjee
Agreed, an inverted index cannot be efficiently maintained in a B-tree(hence RDBMS). But I think we can(or should) have the option of a B-tree based storage for unindexed fields, whereas for indexed fields we can use the existing lucene's architecture. prasen [EMAIL PROTECTED] wrote: Dmi