Re: performance question - number of documents

2011-10-23 Thread Antony Sequeira
This may not be directly relevant to Lucene, but I wanted to learn: How does a web search engine do something like this. Do they also "score every matching document on every query" OR do they pick a subset first based on some static/offlline ranking criteria then do what Lucene does OR do they sea

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-23 Thread prasenjit mukherjee
Any pointers/suggestions on my approach ? On 10/22/11, prasenjit mukherjee wrote: > My use case is the following : > Given an n-dimensional vector ( only +ve quadrants/points ) find its > closest neighbours. I would like to try out with lucene's default > ranking. Here is how a typical document

Re: reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
Can you tell me how I can feed the lucene index by using the term frequency directly ? Actually I am getting the documents along with their term-frequency and don't want to write any additional code to expand them. On 10/23/11, ppp c wrote: > Of curse, it can be reused. > But from my point of v

Re: Using Lucene to index Wikipedia

2011-10-23 Thread Michael Sokolov
Daniel, since no one knowledgeable has answered I'll take a stab - there are a number of ant targets you can run, most of which incorporate some indexing step(s). Basically you can run: ant -Dtask.alg= it looks as if the ant build.xml is set up to run conf/micro-standard.alg by default, but

Re: Filter and query precedence, boolean query

2011-10-23 Thread Josh Devins
I'll reply to the thread with your comment from our IM chat in case it helps anyone else thinking about this. In response to what is preferred, boolean query w/ term queries or a term filter+term query and if order in the boolean query somehow matters: we take care of this internaly no matter whi

Re: Filter and query precedence, boolean query

2011-10-23 Thread Simon Willnauer
hey josh, On Sun, Oct 23, 2011 at 5:39 PM, Josh Devins wrote: > Hi folks, > > I'm hoping someone can shed some light on how filters and boolean queries > work under the hood. As I understand it, the following two queries are > functionally equivalent: > > boolean must, term query: foo, boolean mu

Re: performance question - number of documents

2011-10-23 Thread Erick Erickson
"Why would it matter...top 5 matches" Because Lucene has to calculate the score of all documents in order to insure that it returns those 5 documents. What if the very last document scored was the most relevant? Best Erick On Sun, Oct 23, 2011 at 3:06 PM, sol myr wrote: > Hi, > > We've noticed s

Re: reusing the term-frequency count while indexing

2011-10-23 Thread ppp c
Of curse, it can be reused. But from my point of view, it's meaningless, since the analysis process has to be performed to collect such as prox, offset, or syno, payload and so on. On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee wrote: > I already have the term-frequency-count for all the t

Filter and query precedence, boolean query

2011-10-23 Thread Josh Devins
Hi folks, I'm hoping someone can shed some light on how filters and boolean queries work under the hood. As I understand it, the following two queries are functionally equivalent: boolean must, term query: foo, boolean must, term query: bar term query: foo, term filter: bar What I'd like to unde

reusing the term-frequency count while indexing

2011-10-23 Thread prasenjit mukherjee
I already have the term-frequency-count for all the terms in a document. Is there a way I can re-use that info while indexing. I would like to use solr for this. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org F

performance question - number of documents

2011-10-23 Thread sol myr
Hi, We've noticed some Lucene performance phenomenon, and would appreciate an explanation from anyone familiar with Lucene internals (I know Lucene as a user, but haven't looked under its hood). We have a Lucene index of about 30 million records. We ran 2 queries: "AND" and "OR" ("+john +doe" v

Re: Bet you didn't know Lucene can...

2011-10-23 Thread Dawid Weiss
Hi Grant, In Carrot2 (and Carrot Search's commercial products) we're not using Lucene as an indexing/ search service directly, but we are re-using a lot of internal infrastructure (like analyzers, ported snowball stemmers and other segmentation stuff). We also plan on using the new language identi