Hello Em:

Thanks for your answer.

Yes, we initially also thought that the excessive increase in response time
was caused by the several queries being executed, and we did another test.
We executed one of the subqueries that I've shown to you directly in the
"q" parameter and then we tested this same subquery (only this one, without
the others) with the function query "query($q1)" in the "q" parameter.

Theoretically the times for these two queries should be more or less the
same, but the second one is several times slower than the first one. After
this observation we learned more about function queries and we learned from
the code and from some comments in the forums [1] that the FunctionQueries
are expected to match all documents.

We have some more tests on that matter: now we're moving from issuing this
large query through the SOLR interface to creating our own QueryParser. The
initial tests we've done in our QParser (that internally creates multiple
queries and inserts them inside a DisjunctionMaxQuery) are very good, we're
getting very good response times and high quality answers. But when we've
tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
QueryValueSource that wraps the DisMaxQuery), then the times move from
10-20 msec to 200-300msec.

Note that we're using early termination of queries (via a custom
collector), and therefore (as shown by the numbers I included above) even
if the query is very complex, we're getting very fast answers. The only
situation where the response time explodes is when we include a
FunctionQuery.

Re: your question of what we're trying to achieve ... We're implementing a
powerful query autocomplete system, and we use several fields to a) improve
performance on wildcard queries and b) have a very precise control over the
score.

Thanks a lot for your help,
Carlos

[1]: http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas


On Thu, Feb 16, 2012 at 7:09 PM, Em <mailformailingli...@yahoo.de> wrote:

> Hello Carlos,
>
> well, you must take into account that you are executing up to 8 queries
> per request instead of one query per request.
>
> I am not totally sure about the details of the implementation of the
> max-function-query, but I guess it first iterates over the results of
> the first max-query, afterwards over the results of the second max-query
> and so on. This is a much higher complexity than in the case of a normal
> query.
>
> I would suggest you to optimize your request. I don't think that this
> particular function query is matching *all* docs. Instead I think it
> just matches those docs specified by your inner-query (although I might
> be wrong about that).
>
> What are you trying to achieve by your request?
>
> Regards,
> Em
>
> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas:
> > Hello Em:
> >
> > The URL is quite large (w/ shards, ...), maybe it's best if I paste the
> > relevant parts.
> >
> > Our "q" parameter is:
> >
> >
> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)))))\"",
> >
> > The subqueries q8, q7, q4 and q3 are regular queries, for example:
> >
> > "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND
> > wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR
> > (stopword_phrase:las AND stopword_phrase:de)"
> >
> > We've executed the subqueries q3-q8 independently and they're very fast,
> > but when we introduce the function queries as described below, it all
> goes
> > 10X slower.
> >
> > Let me know if you need anything else.
> >
> > Thanks
> > Carlos
> >
> >
> > Carlos Gonzalez-Cadenas
> > CEO, ExperienceOn - New generation search
> > http://www.experienceon.com
> >
> > Mobile: +34 652 911 201
> > Skype: carlosgonzalezcadenas
> > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> >
> >
> > On Thu, Feb 16, 2012 at 4:02 PM, Em <mailformailingli...@yahoo.de>
> wrote:
> >
> >> Hello carlos,
> >>
> >> could you show us how your Solr-call looks like?
> >>
> >> Regards,
> >> Em
> >>
> >> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
> >>> Hello all:
> >>>
> >>> We'd like to score the matching documents using a combination of SOLR's
> >> IR
> >>> score with another application-specific score that we store within the
> >>> documents themselves (i.e. a float field containing the app-specific
> >>> score). In particular, we'd like to calculate the final score doing
> some
> >>> operations with both numbers (i.e product, sqrt, ...)
> >>>
> >>> According to what we know, there are two ways to do this in SOLR:
> >>>
> >>> A) Sort by function [1]: We've tested an expression like
> >>> "sort=product(score, query_score)" in the SOLR query, where score is
> the
> >>> common SOLR IR score and query_score is our own precalculated score,
> but
> >> it
> >>> seems that SOLR can only do this with stored/indexed fields (and
> >> obviously
> >>> "score" is not stored/indexed).
> >>>
> >>> B) Function queries: We've used _val_ and function queries like max,
> sqrt
> >>> and query, and we've obtained the desired results from a functional
> point
> >>> of view. However, our index is quite large (400M documents) and the
> >>> performance degrades heavily, given that function queries are AFAIK
> >>> matching all the documents.
> >>>
> >>> I have two questions:
> >>>
> >>> 1) Apart from the two options I mentioned, is there any other (simple)
> >> way
> >>> to achieve this that we're not aware of?
> >>>
> >>> 2) If we have to choose the function queries path, would it be very
> >>> difficult to modify the actual implementation so that it doesn't match
> >> all
> >>> the documents, that is, to pass a query so that it only operates over
> the
> >>> documents matching the query?. Looking at the FunctionQuery.java source
> >>> code, there's a comment that says "// instead of matching all docs, we
> >>> could also embed a query. the score could either ignore the subscore,
> or
> >>> boost it", which is giving us some hope that maybe it's possible and
> even
> >>> desirable to go in this direction. If you can give us some directions
> >> about
> >>> how to go about this, we may be able to do the actual implementation.
> >>>
> >>> BTW, we're using Lucene/SOLR trunk.
> >>>
> >>> Thanks a lot for your help.
> >>> Carlos
> >>>
> >>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
> >>>
> >>
> >
>

Reply via email to