Agreed, an inverted index cannot be efficiently maintained in a
B-tree(hence RDBMS). But I think we can(or should) have the option of
a B-tree based storage for unindexed fields, whereas for indexed fields
we can use the existing lucene's architecture.
prasen
[EMAIL PROTECTED] wrote:
Maxym Mykhalchuk [EMAIL PROTECTED] wrote on 10/04/2006 09:46:16 PM:
Here's the issue: All my documents will be having a few (2-3:
title, short description) short fields. You see, it's rare that the
same word is repeated several times in a title, so will Lucene be
able to give me a decent
Hi Nadav,
Thanks for suggestions.
As for improving multi-word queries, Doug Cutting recently posted a link to
his presentation,
http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf,
just scroll down to Nutch N-Grams there, and you'll see the answer.
Basically, Buffy the
Maxym Mykhalchuk [EMAIL PROTECTED] wrote on 11/04/2006 11:52:07 AM:
As for improving multi-word queries, Doug Cutting recently posted a link
to
his presentation,
http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf,
just scroll down to Nutch N-Grams there, and you'll see
On Dienstag 11 April 2006 10:33, Nadav Har'El wrote:
This sort of proximity-influenced scoring is missing from
Lucene's QueryParser, and I've been wondering recently
on how it is best to add it, and whether it is possible to
easily do it with existing Lucene machinary, like the
SpanQuery
What be way for clusterizations of searching?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Yonik,
Could you explain why an IndexSearcher constructed from multiple readers is
faster than a MultiSearcher constructed from same readers?
Thanks,
Peter
On 4/10/06, Yonik Seeley [EMAIL PROTECTED] wrote:
On 4/10/06, oramas martÃn [EMAIL PROTECTED] wrote:
Is there any performance (or
I guess Compass is probably the way to go - http://www.opensymphony.com/compass/
From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED]
Sent: Tue 4/11/2006 2:45 AM
To: java-user@lucene.apache.org
Subject: Re: Distributed Lucene.. - clustering as a requirement
It uses a combination of boolean, to get the set of matching
documents, and vector space (by default) to rank them. Or one might
say it uses the vector space model, and only returns nonzero scoring
documents.
On 4/10/06, hu andy [EMAIL PROTECTED] wrote:
I have seen in some documents that there
if you use a custom SImilarity class, the tf(float) function is used for
phrases to determine how the score should be determined based on the
number of times the phrase qppears in the documents.
if you make it an identity function, and modify the other functions in the
Similarity to be (mostly)
On Apr 11, 2006, at 1:46 PM, miki sun wrote:
Is there any theory behind the similarity measure of Lucene?
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
Similarity.html
No, Doug just made it up with some random mathematical formulas, just
for fun :)
Erik
Does this mean that MultiReader doesn't merge the search results and sort
the results as if there was only one index? If not, does it simply
concatenate the results?
Peter
On 4/11/06, Yonik Seeley [EMAIL PROTECTED] wrote:
On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote:
Could you explain
On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote:
Does this mean that MultiReader doesn't merge the search results and sort
the results as if there was only one index?
Correct, it doesn't. It supports the lower level primitives like
TermEnum and TermDocs that searches use to run. A term query
On 4/11/06, Peter Keegan [EMAIL PROTECTED] wrote:
Oops. I meant to say: Does this mean that an IndexSearcher constructed from
a MultiReader doesn't merge the search results and sort the results as if
there was only one index?
That's how I answered it.
A single search is done... the merging of
Peter Keegan wrote:
Oops. I meant to say: Does this mean that an IndexSearcher constructed from
a MultiReader doesn't merge the search results and sort the results as if
there was only one index?
It doesn't have to, since a MultiReader *is* a single index.
A quick test indicates that it does
H,
We have made documents out of the rows in our database and one of the team
is suggesting that we abandon some of our database queries and instead use
lucene. I think there are some fundamental problems with this especially
when it comes to association tables (where there is a 1 one to many
1) An inverted full text index is not a replacment for a relational
database.
2) many people think they need a relational database, when all they really
need is a well designed full text index.
To get to some of your specific questions...
: them in one field). One of the problems I see would
Raghavendra Prabhu wrote:
While Indexing, I use a different Analyser
While searching, I use a simple standard Analyzer
Will this prevent me from getting the same best fragments when i do a search
for two terms say term1 and term2
It depends on the differences, but in general you will always
18 matches
Mail list logo