PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
I know this has been discussed to great length, but I still have not found a satisfactory solution and I am hoping someone on the list has some ideas... We have a large index (4M+ Documents) with a handful of Fields. We need to perform PrefixQueries on multiple fields. The problem is that when

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Sorry for the double post, but I think I can clarify the problem a little more. We want to execute: query: A | B | C | D filter: null However, C and D cause TooManyClauses, so instead we execute: query: A | B filter: C | D My understanding is that Lucene will apply the Filter (C

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal ssenecal.w...@gmail.com wrote: Up to Lucene 2.4, this has been working out for us. However, in Lucene 2.9 this breaks since rewrite() now returns a ConstantScoreQuery. You can get back to the 2.4 behavior by calling

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Thanks for the explanation Mike. It looks like I have no choice but to move any queries which throw TooManyClauses to be Filters. Sadly, this means a max query time of 6s under load unless I can find a way to rewrite the query to span a Query and a Filter. Thanks again On Thu, Oct 15, 2009

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Karl Wettin
14 okt 2009 kl. 15.15 skrev Grant Ingersoll: On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote: I am trying to compute the counts of terms of the documents returned by running a query using a TermVectorMapper. I was wondering if anyone knew if there was a faster way to do this rather

Re: Using TermVectorMapper to compute term frequency across documents

2009-10-15 Thread Thomas D'Silva
Grant, I have an index with documents that have a text field containing document text, and a tag field containing tags associated with the document. I am trying to calculate the probability that a document contains a particular word and is tagged with a particular tag. This is related to a

How to set boost for a certain term in a query

2009-10-15 Thread Chuan
For example, I want the term 'sport' to have more impact on the final rank. Thanks in advance. Chuan -- View this message in context: http://www.nabble.com/How-to-set-boost-for-a-certain-term-in-a-query-tp25909737p25909737.html Sent from the Lucene - Java Users mailing list archive at

Re: How to set boost for a certain term in a query

2009-10-15 Thread Ian Lea
http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Boosting%20a%20Term -- Ian. On Thu, Oct 15, 2009 at 3:33 PM, Chuan shichuanwu...@gmail.com wrote: For example, I want the term 'sport' to have more impact on the final rank. Thanks in advance. Chuan

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
You should be able to do exactly what you were doing on 2.4, right? (By setting the rewrite method). Mike On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal ssenecal.w...@gmail.com wrote: Thanks for the explanation Mike.  It looks like I have no choice but to move any queries which throw

Re: Invitation: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features (Sep 24 02:00 PM EDT)

2009-10-15 Thread Eran Sevi
Is there a recording of the Webinars for anyone who's missed it? On Sat, Sep 19, 2009 at 12:03 AM, aravind.yar...@equifax.com wrote: *Description* __ Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features

How to sort and get document scores afterwards

2009-10-15 Thread Christian Reuschling
Hi, our application enables sorting the result lists according to field values, currently all represented as Strings (we plan to also migrate to the new numeric type capabilities of Lucene 2.9 at a later time) For this, the documents will be sorted e.g. according to the author, which works fine

RE: How to sort and get document scores afterwards

2009-10-15 Thread Uwe Schindler
The default API searcher.search works like this now. If you want to control, the retrieval of scores, create a TopFieldCollector directly: http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/TopFiel dCollector.html The static create methods has many possibilities to control the

Re: How to sort and get document scores afterwards

2009-10-15 Thread Michael McCandless
Yeah this was a change in 2.9... but you can get the scores back, if you do this: TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields, true /* trackDocScores */,

NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
I'm using Lucene 2.9 and sometimes get a NPE in NearSpansUnordered: java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.lucene.search.spans.NearSpansUnordered.start(NearSpansUnordered.java:219) at

Re: NPE in NearSpansUnordered

2009-10-15 Thread Yonik Seeley
Are you using any custom query types? Anything to help us reproduce (like the acutal query this happened on) would be greatly appreciated. -Yonik http://www.lucidimagination.com On Thu, Oct 15, 2009 at 1:17 PM, Peter Keegan peterlkee...@gmail.com wrote: I'm using Lucene 2.9 and sometimes get

Re: NPE in NearSpansUnordered

2009-10-15 Thread Peter Keegan
The query is: +payloadNear([spanNear([contents:insurance, contents:agent], 1, false), spanNear([contents:winston, contents:salem], 1, false)], 10, false) It's using the default payload function scorer (average value) It doesn't happen on all queries of this type, only a handful. This is

Re: Invitation: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features (Sep 24 02:00 PM EDT)

2009-10-15 Thread Simon Willnauer
http://www.lucidimagination.com/How-We-Can-Help/webinar-Lucene-29 here can you download the slides and watch the webinar. simon On Thu, Oct 15, 2009 at 6:32 PM, Eran Sevi erans...@gmail.com wrote: Is there a recording of the Webinars for anyone who's missed it? On Sat, Sep 19, 2009 at 12:03

OpenRelevance

2009-10-15 Thread Omar Alonso
Hi folks, I would like to know if people are interested in the OpenRelevance project (http://wiki.apache.org/lucene-java/OpenRelevance). I've done quite a few experiments on Amazon Mechanical Turk using TREC and INEX data sets, so one approach would be to use crowdsourcing for such task.

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
At first I thought so, yes, but then I realised that the query I wanted to execute was A | B | C | D and in reality I was executing (A | B) (C | D). I guess my unit tests were missing some cases and don't currently catch this. On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Michael McCandless
Well, you could wrap the C | D filter as a Query (using ConstantScoreQuery), and then add that as a SHOULD clause on your toplevel BooleanQuery? Mike On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal ssenecal.w...@gmail.com wrote: At first I thought so, yes, but then I realised that the query I

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

2009-10-15 Thread Shaun Senecal
Ah! I thought that the ConstantScoreQuery would also be rewritten into a BooleanQuery, resulting in the same exception. If that's the case, then this should work. I'll give that a try when I get into the office this morning. On Fri, Oct 16, 2009 at 6:46 AM, Michael McCandless