This may not be directly relevant to Lucene, but I wanted to learn:
How does a web search engine do something like this.
Do they also "score every matching document on every query" OR
do they pick a subset first based on some static/offlline ranking criteria
then do what Lucene does OR
do they sea
Any pointers/suggestions on my approach ?
On 10/22/11, prasenjit mukherjee wrote:
> My use case is the following :
> Given an n-dimensional vector ( only +ve quadrants/points ) find its
> closest neighbours. I would like to try out with lucene's default
> ranking. Here is how a typical document
Can you tell me how I can feed the lucene index by using the term
frequency directly ?
Actually I am getting the documents along with their term-frequency
and don't want to write any additional code to expand them.
On 10/23/11, ppp c wrote:
> Of curse, it can be reused.
> But from my point of v
Daniel, since no one knowledgeable has answered I'll take a stab - there
are a number of ant targets you can run, most of which incorporate some
indexing step(s). Basically you can run:
ant -Dtask.alg=
it looks as if the ant build.xml is set up to run
conf/micro-standard.alg by default, but
I'll reply to the thread with your comment from our IM chat in case it helps
anyone else thinking about this.
In response to what is preferred, boolean query w/ term queries or a term
filter+term query and if order in the boolean query somehow matters:
we take care of this internaly
no matter whi
hey josh,
On Sun, Oct 23, 2011 at 5:39 PM, Josh Devins wrote:
> Hi folks,
>
> I'm hoping someone can shed some light on how filters and boolean queries
> work under the hood. As I understand it, the following two queries are
> functionally equivalent:
>
> boolean must, term query: foo, boolean mu
"Why would it matter...top 5 matches" Because Lucene has to calculate
the score of all documents in order to insure that it returns those 5 documents.
What if the very last document scored was the most relevant?
Best
Erick
On Sun, Oct 23, 2011 at 3:06 PM, sol myr wrote:
> Hi,
>
> We've noticed s
Of curse, it can be reused.
But from my point of view, it's meaningless, since the analysis process has
to be performed to collect such as prox, offset, or syno, payload and so on.
On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee
wrote:
> I already have the term-frequency-count for all the t
Hi folks,
I'm hoping someone can shed some light on how filters and boolean queries
work under the hood. As I understand it, the following two queries are
functionally equivalent:
boolean must, term query: foo, boolean must, term query: bar
term query: foo, term filter: bar
What I'd like to unde
I already have the term-frequency-count for all the terms in a
document. Is there a way I can re-use that info while indexing. I
would like to use solr for this.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
F
Hi,
We've noticed some Lucene performance phenomenon, and would appreciate an
explanation from anyone familiar with Lucene internals
(I know Lucene as a user, but haven't looked under its hood).
We have a Lucene index of about 30 million records.
We ran 2 queries: "AND" and "OR" ("+john +doe" v
Hi Grant,
In Carrot2 (and Carrot Search's commercial products) we're not using
Lucene as an indexing/ search service directly, but we are re-using a
lot of internal infrastructure (like analyzers, ported snowball
stemmers and other segmentation stuff). We also plan on using the new
language identi
12 matches
Mail list logo