Hello Em: 1) Here's a printout of an example DisMax query (as you can see mostly MUST terms except for some SHOULD terms used for boosting scores for stopwords) * * *((+stopword_shortened_phrase:hoteles +stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short ened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw ord_phrase:barcelona stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona stopword_phrase:en))* * * 2)* *The collector is inserted in the SolrIndexSearcher (replacing the TimeLimitingCollector). We trigger it through the SOLR interface by passing the timeAllowed parameter. We know this is a hack but AFAIK there's no out-of-the-box way to specify custom collectors by now ( https://issues.apache.org/jira/browse/SOLR-1680). In any case the collector part works perfectly as of now, so clearly this is not the problem.
3) Re: your sentence: * * **I* would expect that with a shrinking set of matching documents to the overall-query, the function query only checks those documents that are guaranteed to be within the result set.* * * Yes, I agree with this, but this snippet of code in FunctionQuery.java seems to say otherwise: // instead of matching all docs, we could also embed a query. // the score could either ignore the subscore, or boost it. // Containment: floatline(foo:myTerm, "myFloatField", 1.0, 0.0f) // Boost: foo:myTerm^floatline("myFloatField",1.0,0.0f) @Override public int nextDoc() throws IOException { for(;;) { ++doc; if (doc>=maxDoc) { return doc=NO_MORE_DOCS; } if (acceptDocs != null && !acceptDocs.get(doc)) continue; return doc; } } It seems that the author also thought of maybe embedding a query in order to restrict matches, but this doesn't seem to be in place as of now (or maybe I'm not understanding how the whole thing works :) ). Thanks Carlos * * Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas On Thu, Feb 16, 2012 at 8:09 PM, Em <mailformailingli...@yahoo.de> wrote: > Hello Carlos, > > > We have some more tests on that matter: now we're moving from issuing > this > > large query through the SOLR interface to creating our own > QueryParser. The > > initial tests we've done in our QParser (that internally creates multiple > > queries and inserts them inside a DisjunctionMaxQuery) are very good, > we're > > getting very good response times and high quality answers. But when we've > > tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a > > QueryValueSource that wraps the DisMaxQuery), then the times move from > > 10-20 msec to 200-300msec. > I reviewed the sourcecode and yes, the FunctionQuery iterates over the > whole index, however... let's see! > > In relation to the DisMaxQuery you create within your parser: What kind > of clause is the FunctionQuery and what kind of clause are your other > queries (MUST, SHOULD, MUST_NOT...)? > > *I* would expect that with a shrinking set of matching documents to the > overall-query, the function query only checks those documents that are > guaranteed to be within the result set. > > > Note that we're using early termination of queries (via a custom > > collector), and therefore (as shown by the numbers I included above) even > > if the query is very complex, we're getting very fast answers. The only > > situation where the response time explodes is when we include a > > FunctionQuery. > Could you give us some details about how/where did you plugin the > Collector, please? > > Kind regards, > Em > > Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas: > > Hello Em: > > > > Thanks for your answer. > > > > Yes, we initially also thought that the excessive increase in response > time > > was caused by the several queries being executed, and we did another > test. > > We executed one of the subqueries that I've shown to you directly in the > > "q" parameter and then we tested this same subquery (only this one, > without > > the others) with the function query "query($q1)" in the "q" parameter. > > > > Theoretically the times for these two queries should be more or less the > > same, but the second one is several times slower than the first one. > After > > this observation we learned more about function queries and we learned > from > > the code and from some comments in the forums [1] that the > FunctionQueries > > are expected to match all documents. > > > > We have some more tests on that matter: now we're moving from issuing > this > > large query through the SOLR interface to creating our own QueryParser. > The > > initial tests we've done in our QParser (that internally creates multiple > > queries and inserts them inside a DisjunctionMaxQuery) are very good, > we're > > getting very good response times and high quality answers. But when we've > > tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a > > QueryValueSource that wraps the DisMaxQuery), then the times move from > > 10-20 msec to 200-300msec. > > > > Note that we're using early termination of queries (via a custom > > collector), and therefore (as shown by the numbers I included above) even > > if the query is very complex, we're getting very fast answers. The only > > situation where the response time explodes is when we include a > > FunctionQuery. > > > > Re: your question of what we're trying to achieve ... We're implementing > a > > powerful query autocomplete system, and we use several fields to a) > improve > > performance on wildcard queries and b) have a very precise control over > the > > score. > > > > Thanks a lot for your help, > > Carlos > > > > [1]: > http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0 > > > > Carlos Gonzalez-Cadenas > > CEO, ExperienceOn - New generation search > > http://www.experienceon.com > > > > Mobile: +34 652 911 201 > > Skype: carlosgonzalezcadenas > > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas > > > > > > On Thu, Feb 16, 2012 at 7:09 PM, Em <mailformailingli...@yahoo.de> > wrote: > > > >> Hello Carlos, > >> > >> well, you must take into account that you are executing up to 8 queries > >> per request instead of one query per request. > >> > >> I am not totally sure about the details of the implementation of the > >> max-function-query, but I guess it first iterates over the results of > >> the first max-query, afterwards over the results of the second max-query > >> and so on. This is a much higher complexity than in the case of a normal > >> query. > >> > >> I would suggest you to optimize your request. I don't think that this > >> particular function query is matching *all* docs. Instead I think it > >> just matches those docs specified by your inner-query (although I might > >> be wrong about that). > >> > >> What are you trying to achieve by your request? > >> > >> Regards, > >> Em > >> > >> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas: > >>> Hello Em: > >>> > >>> The URL is quite large (w/ shards, ...), maybe it's best if I paste the > >>> relevant parts. > >>> > >>> Our "q" parameter is: > >>> > >>> > >> > "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)))))\"", > >>> > >>> The subqueries q8, q7, q4 and q3 are regular queries, for example: > >>> > >>> "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND > >>> wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR > >>> (stopword_phrase:las AND stopword_phrase:de)" > >>> > >>> We've executed the subqueries q3-q8 independently and they're very > fast, > >>> but when we introduce the function queries as described below, it all > >> goes > >>> 10X slower. > >>> > >>> Let me know if you need anything else. > >>> > >>> Thanks > >>> Carlos > >>> > >>> > >>> Carlos Gonzalez-Cadenas > >>> CEO, ExperienceOn - New generation search > >>> http://www.experienceon.com > >>> > >>> Mobile: +34 652 911 201 > >>> Skype: carlosgonzalezcadenas > >>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas > >>> > >>> > >>> On Thu, Feb 16, 2012 at 4:02 PM, Em <mailformailingli...@yahoo.de> > >> wrote: > >>> > >>>> Hello carlos, > >>>> > >>>> could you show us how your Solr-call looks like? > >>>> > >>>> Regards, > >>>> Em > >>>> > >>>> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas: > >>>>> Hello all: > >>>>> > >>>>> We'd like to score the matching documents using a combination of > SOLR's > >>>> IR > >>>>> score with another application-specific score that we store within > the > >>>>> documents themselves (i.e. a float field containing the app-specific > >>>>> score). In particular, we'd like to calculate the final score doing > >> some > >>>>> operations with both numbers (i.e product, sqrt, ...) > >>>>> > >>>>> According to what we know, there are two ways to do this in SOLR: > >>>>> > >>>>> A) Sort by function [1]: We've tested an expression like > >>>>> "sort=product(score, query_score)" in the SOLR query, where score is > >> the > >>>>> common SOLR IR score and query_score is our own precalculated score, > >> but > >>>> it > >>>>> seems that SOLR can only do this with stored/indexed fields (and > >>>> obviously > >>>>> "score" is not stored/indexed). > >>>>> > >>>>> B) Function queries: We've used _val_ and function queries like max, > >> sqrt > >>>>> and query, and we've obtained the desired results from a functional > >> point > >>>>> of view. However, our index is quite large (400M documents) and the > >>>>> performance degrades heavily, given that function queries are AFAIK > >>>>> matching all the documents. > >>>>> > >>>>> I have two questions: > >>>>> > >>>>> 1) Apart from the two options I mentioned, is there any other > (simple) > >>>> way > >>>>> to achieve this that we're not aware of? > >>>>> > >>>>> 2) If we have to choose the function queries path, would it be very > >>>>> difficult to modify the actual implementation so that it doesn't > match > >>>> all > >>>>> the documents, that is, to pass a query so that it only operates over > >> the > >>>>> documents matching the query?. Looking at the FunctionQuery.java > source > >>>>> code, there's a comment that says "// instead of matching all docs, > we > >>>>> could also embed a query. the score could either ignore the subscore, > >> or > >>>>> boost it", which is giving us some hope that maybe it's possible and > >> even > >>>>> desirable to go in this direction. If you can give us some directions > >>>> about > >>>>> how to go about this, we may be able to do the actual implementation. > >>>>> > >>>>> BTW, we're using Lucene/SOLR trunk. > >>>>> > >>>>> Thanks a lot for your help. > >>>>> Carlos > >>>>> > >>>>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function > >>>>> > >>>> > >>> > >> > > >