We're working on the same problem with the combination of the scale(query(...)) combination, so I'd like to share a bit more information that may be useful.
*On the scale function:* Even thought the scale query has to calculate the scores for all documents, it is actually doing this work twice for each ValueSource (once to calculate the min and max values, and then again when actually scoring the documents), which is inefficient. To solve the problem, we're in the process of putting a cache inside the scale function to remember the values for each document when they are initially computed (to find the min and max) so that the second pass can just use the previously computed values for each document. Our theory is that most of the extra time due to the scale function is really just the result of doing duplicate work. No promises this won't be overly costly in terms of memory utilization, but we'll see what we get in terms of speed improvements and will share the code if it works out well. Alternate implementation suggestions (or criticism of a cache like this) are also welcomed. *On the NoOp product function: scale(prod(1, query(...))):* We do the same thing, which ultimately is just an unnecessary waste of a loop through all documents to do an extra multiplication step. I just debugged the code and uncovered the problem. There is a Map (called context) that is passed through to each value source to store intermediate state, and both the query and scale functions are passing the ValueSource for the query function in as the KEY to this Map (as opposed to using some composite key that makes sense in the current context). Essentially, these lines are overwriting each other: Inside ScaleFloatFunction: context.put(this.source, scaleInfo); //this.source refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object Inside QueryValueSource: context.put(this, w); //this refers to the same QueryValueSource from above, and the w refers to a Weight object As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the context Map, it unexpectedly pulls the Weight object out instead and thus the invalid case exception occurs. The NoOp multiplication works because it puts an "different" ValueSource between the query and the ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this (in QueryValueSource). This should be an easy fix. I'll create a JIRA ticket to use better key names in these functions and push up a patch. This will eliminate the need for the extra NoOp function. -Trey On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan <peterlkee...@gmail.com>wrote: > I'm persuing this possible PostFilter solution, I can see how to collect > all the hits and recompute the scores in a PostFilter, after all the hits > have been collected (for scaling). Now, I can't see how to get the custom > doc/score values back into the main query's HitQueue. Any advice? > > Thanks, > Peter > > > On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan <peterlkee...@gmail.com > >wrote: > > > Instead of using a function query, could I use the edismax query (plus > > some low cost filters not shown in the example) and implement the > > scale/sum/product computation in a PostFilter? Is the query's maxScore > > available there? > > > > Thanks, > > Peter > > > > > > On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan <peterlkee...@gmail.com > >wrote: > > > >> Although the 'scale' is a big part of it, here's a closer breakdown. > Here > >> are 4 queries with increasing functions, and theei response times > (caching > >> turned off in solrconfig): > >> > >> 100 msec: > >> select?q={!edismax v='news' qf='title^2 body'} > >> > >> 135 msec: > >> select?qq={!edismax v='news' qf='title^2 > >> body'}q={!func}product(field(myfield),query($qq)&fq={!query v=$qq} > >> > >> 200 msec: > >> select?qq={!edismax v='news' qf='title^2 > >> > body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfield))))&fq={!query > >> v=$qq} > >> > >> 320 msec: > >> select?qq={!edismax v='news' qf='title^2 > >> > body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query > >> v=$qq} > >> > >> Btw, that no-op product is necessary, else you get this exception: > >> > >> org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to > org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo > >> > >> thanks, > >> > >> peter > >> > >> > >> > >> On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter < > >> hossman_luc...@fucit.org> wrote: > >> > >>> > >>> : So, this query does just what I want, but it's typically 3 times > slower > >>> : than the edismax query without the functions: > >>> > >>> that's because the scale() function is inhernetly slow (it has to > >>> compute the min & max value for every document in order to know how to > >>> scale them) > >>> > >>> what you are seeing is the price you have to pay to get that query > with a > >>> "normalized" 0-1 value. > >>> > >>> (you might be able to save a little bit of time by eliminating that > >>> no-Op multiply by 1: "product(query($qq),1)" ... but i doubt you'll > even > >>> notice much of a chnage given that scale function. > >>> > >>> : Is there any way to speed this up? Would writing a custom function > >>> query > >>> : that compiled all the function queries together be any faster? > >>> > >>> If you can find a faster implementation for scale() then by all means > let > >>> us konw, and we can fold it back into Solr. > >>> > >>> > >>> -Hoss > >>> > >> > >> > > >