Re: Distributed Lucene.. - clustering as a requirement

2006-04-11 Thread Prasenjit Mukherjee
Agreed, an inverted index cannot be efficiently maintained in a B-tree(hence RDBMS). But I think we can(or should) have the option of a B-tree based storage for unindexed fields, whereas for indexed fields we can use the existing lucene's architecture. prasen [EMAIL PROTECTED] wrote:

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har'El
Maxym Mykhalchuk [EMAIL PROTECTED] wrote on 10/04/2006 09:46:16 PM: Here's the issue: All my documents will be having a few (2-3: title, short description) short fields. You see, it's rare that the same word is repeated several times in a title, so will Lucene be able to give me a decent

Re: Small field indexing and ranking

2006-04-11 Thread Maxym Mykhalchuk
Hi Nadav, Thanks for suggestions. As for improving multi-word queries, Doug Cutting recently posted a link to his presentation, http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf, just scroll down to Nutch N-Grams there, and you'll see the answer. Basically, Buffy the

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har'El
Maxym Mykhalchuk [EMAIL PROTECTED] wrote on 11/04/2006 11:52:07 AM: As for improving multi-word queries, Doug Cutting recently posted a link to his presentation, http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf, just scroll down to Nutch N-Grams there, and you'll see

Re: Small field indexing and ranking

2006-04-11 Thread Daniel Naber
On Dienstag 11 April 2006 10:33, Nadav Har'El wrote: This sort of proximity-influenced scoring is missing from Lucene's QueryParser, and I've been wondering recently on how it is best to add it, and whether it is possible to easily do it with existing Lucene machinary, like the SpanQuery

Clusterization of searching

2006-04-11 Thread anton
What be way for clusterizations of searching? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MultiReader and MultiSearcher

2006-04-11 Thread Peter Keegan
Yonik, Could you explain why an IndexSearcher constructed from multiple readers is faster than a MultiSearcher constructed from same readers? Thanks, Peter On 4/10/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/10/06, oramas martín [EMAIL PROTECTED] wrote: Is there any performance (or

RE: Distributed Lucene.. - clustering as a requirement

2006-04-11 Thread Dmitry Goldenberg
I guess Compass is probably the way to go - http://www.opensymphony.com/compass/ From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tue 4/11/2006 2:45 AM To: java-user@lucene.apache.org Subject: Re: Distributed Lucene.. - clustering as a requirement

Re: What is the retrieval modle for lucene?

2006-04-11 Thread Chris Lamprecht
It uses a combination of boolean, to get the set of matching documents, and vector space (by default) to rank them. Or one might say it uses the vector space model, and only returns nonzero scoring documents. On 4/10/06, hu andy [EMAIL PROTECTED] wrote: I have seen in some documents that there

Re: getting frequency of a phrase within documents

2006-04-11 Thread Chris Hostetter
if you use a custom SImilarity class, the tf(float) function is used for phrases to determine how the score should be determined based on the number of times the phrase qppears in the documents. if you make it an identity function, and modify the other functions in the Similarity to be (mostly)

Re: search.Similarity

2006-04-11 Thread Erik Hatcher
On Apr 11, 2006, at 1:46 PM, miki sun wrote: Is there any theory behind the similarity measure of Lucene? http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ Similarity.html No, Doug just made it up with some random mathematical formulas, just for fun :) Erik

Re: MultiReader and MultiSearcher

2006-04-11 Thread Peter Keegan
Does this mean that MultiReader doesn't merge the search results and sort the results as if there was only one index? If not, does it simply concatenate the results? Peter On 4/11/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote: Could you explain

Re: MultiReader and MultiSearcher

2006-04-11 Thread Yonik Seeley
On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote: Does this mean that MultiReader doesn't merge the search results and sort the results as if there was only one index? Correct, it doesn't. It supports the lower level primitives like TermEnum and TermDocs that searches use to run. A term query

Re: MultiReader and MultiSearcher

2006-04-11 Thread Yonik Seeley
On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote: Oops. I meant to say: Does this mean that an IndexSearcher constructed from a MultiReader doesn't merge the search results and sort the results as if there was only one index? That's how I answered it. A single search is done... the merging of

Re: MultiReader and MultiSearcher

2006-04-11 Thread Doug Cutting
Peter Keegan wrote: Oops. I meant to say: Does this mean that an IndexSearcher constructed from a MultiReader doesn't merge the search results and sort the results as if there was only one index? It doesn't have to, since a MultiReader *is* a single index. A quick test indicates that it does

Lucene Seaches VS. Relational database Queries

2006-04-11 Thread Ananth T. Sarathy
H, We have made documents out of the rows in our database and one of the team is suggesting that we abandon some of our database queries and instead use lucene. I think there are some fundamental problems with this especially when it comes to association tables (where there is a 1 one to many

Re: Lucene Seaches VS. Relational database Queries

2006-04-11 Thread Chris Hostetter
1) An inverted full text index is not a replacment for a relational database. 2) many people think they need a relational database, when all they really need is a well designed full text index. To get to some of your specific questions... : them in one field). One of the problems I see would

Re: analyser

2006-04-11 Thread Daniel Noll
Raghavendra Prabhu wrote: While Indexing, I use a different Analyser While searching, I use a simple standard Analyzer Will this prevent me from getting the same best fragments when i do a search for two terms say term1 and term2 It depends on the differences, but in general you will always