>From looking at https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ScaleFloatFunction.java#L70 I conclude that min,max are obtained from all docs in the index. But if you specify query() as an argument for scale() it takes only matching docs for evaluating min&max. So, what I get so far you a looking for a query which matches an intersection of $q AND $fq but yield price field value as its score. It seems I've got the problem definition. I'll come up with a proposal a little bit later.
On Wed, Jun 1, 2022 at 11:33 AM Vincenzo D'Amore <[email protected]> wrote: > Hi Mikhail, > > sorry for not being clear, I'll try again. > For my understanding the solr scale function, once applied to a field, > needs min and max for that field. > Those min and max values by default are calculated by all the existing > documents, I don't know exactly how this is implemented internally in Solr. > I assume that, in the worst case scenario, all the documents have to be > traversed reading all the values for the given field and then somehow > saving the min/max. > In the Solr scale function documentation is also written: > > The current implementation cannot distinguish when documents have been > deleted or documents that have no value. It uses 0.0 values for these > cases. > This means that often the min value can be 0 if you have only positive > values. > > But what happens if I need to scale the values of a field only within the > documents that are the result of a query? Only a few hundreds or thousands > of documents? > First of all min and max has to be calculated only on the result set of > your query. > That is what I was trying to say when I wrote "apply the scale function > only to the result set (and not to the entire collection)". > > For example, if you apply the scale function to the field price in Solr > techproducts example, "min" and "max" are between 0.0 and 2199.0 > > > http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price > > So even if a filter query is added - fq=popularity:(1 OR 7) - the values > are scaled between 0.0 and 2199.0. > > > http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201) > > { > "responseHeader":{ > "status":0, > "QTime":30, > "params":{ > "q":"*:*", > "fl":"price,scale(price, 0, 1)", > "fq":"popularity:(1 OR 7)", > "rows":"100"}}, > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > { > "price":74.99, > "scale(price, 0, 1)":0.034101862}, > { > "price":19.95, > "scale(price, 0, 1)":0.009072306}, > { > "price":11.5, > "scale(price, 0, 1)":0.0052296496}, > { > "price":329.95, > "scale(price, 0, 1)":0.15004548}, > { > "price":479.95, > "scale(price, 0, 1)":0.2182583}, > { > "price":649.99, > "scale(price, 0, 1)":0.29558435}] > }} > > As you can see in the results of this query, prices are between 11.5 and > 649.99. > What if I want to scale the prices between 11.5 and 649.99? > Or, in other words, what is the easiest way to scale all the values of a > field with the min and max of the current query results? > > Right now I'm investigating what's the best way to scale the values of one > or more fields within Solr, but only within the documents that are in the > current result set. > > Hope this helps to make things clearer. > > Best regards, > Vincenzo > > > > > On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev <[email protected]> wrote: > > > Vincenzo, > > Can you elaborate what it means ' apply the scale function only to the > > result set (and not to > > the entire collection).' ? > > > > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore <[email protected]> > > wrote: > > > > > Hi Mikhail, > > > > > > I'm trying to apply the scale function only to the result set (and not > to > > > the entire collection). > > > And I discovered that adding "query($q)" to the scale function does the > > > trick. > > > In other words, adding "query($q)" forces solr to restrict the scale > > > function only to the result set. > > > > > > But if I add an fq to the query parameters the scale function applies > > only > > > to the q param. > > > For example: > > > > > > > > > > > > http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s > > > > > > { > > > "responseHeader":{ > > > "status":0, > > > "QTime":8, > > > "params":{ > > > "q":"*:*", > > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > > "fq":"popularity:(1 OR 7)", > > > "rows":"100"}}, > > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > > { > > > "price":74.99, > > > "scale(sum(price,query($q)), 0, 1)":0.034101862}, > > > { > > > "price":19.95, > > > "scale(sum(price,query($q)), 0, 1)":0.009072306}, > > > { > > > "price":11.5, > > > "scale(sum(price,query($q)), 0, 1)":0.0052296496}, > > > { > > > "price":329.95, > > > "scale(sum(price,query($q)), 0, 1)":0.15004548}, > > > { > > > "price":479.95, > > > "scale(sum(price,query($q)), 0, 1)":0.2182583}, > > > { > > > "price":649.99, > > > "scale(sum(price,query($q)), 0, 1)":0.29558435}] > > > }} > > > > > > I can avoid this problem by adding a new parameter query($fq) to the > > scale > > > function, but this solution is cumbersome and not maintainable. > > > For example: > > > > > > > > > > > > http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($fq)),%200,%201),manu_id_s > > > > > > { > > > "responseHeader":{ > > > "status":0, > > > "QTime":1, > > > "params":{ > > > "q":"manu_id_s:(corsair belkin canon viewsonic)", > > > "fl":"price,scale(sum(sum(price,query($q)),query($fq)), 0, > > > 1),manu_id_s", > > > "fq":"price:[0 TO 200]", > > > "rows":"100"}}, > > > "response":{"numFound":5,"start":0,"numFoundExact":true,"docs":[ > > > { > > > "manu_id_s":"belkin", > > > "price":19.95, > > > "scale(sum(sum(price,query($q)),query($fq)), 0, > 1)":0.048746154}, > > > { > > > "manu_id_s":"belkin", > > > "price":11.5, > > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.0}, > > > { > > > "manu_id_s":"canon", > > > "price":179.99, > > > "scale(sum(sum(price,query($q)),query($fq)), 0, > 1)":0.97198087}, > > > { > > > "manu_id_s":"corsair", > > > "price":185.0, > > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":1.0}, > > > { > > > "manu_id_s":"corsair", > > > "price":74.99, > > > "scale(sum(sum(price,query($q)),query($fq)), 0, 1)":0.3653772}] > > > }} > > > > > > > > > > > > > > > On Tue, May 31, 2022 at 2:48 PM Mikhail Khludnev <[email protected]> > > wrote: > > > > > > > Hello Vincenzo, > > > > > > > > I'm not getting your point: > > > > > > > > > if I add an fq parameter the scale function still continues to work > > > only > > > > on > > > > the q param . > > > > > > > > well, but the function actually refers to q param: > > > > scale(sum(price,query($q)), 0, 1). > > > > > > > > What's your expectation values of query($q) with "q":"popularity:(1 > > OR > > > > 7)"? I suggest to check it with fl=score > > > > > > > > > > > > On Tue, May 31, 2022 at 2:05 PM Vincenzo D'Amore <[email protected] > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > playing with the solr scale function I found a few corner cases > > where I > > > > > need to scale only the results set. > > > > > > > > > > I found a workaround that works but it does not seem to be viable, > > > > because > > > > > if I add an fq parameter the scale function still continues to work > > > only > > > > on > > > > > the q param . > > > > > > > > > > For example with q=popularity:(1 OR 7): > > > > > > > > > > http://localhost:8983/solr/techproducts/select?q=popularity:(1 OR > > > > > 7)&rows=100&fl=price,scale(sum(price,query($q)), 0, 1) > > > > > > > > > > { > > > > > "responseHeader":{ > > > > > "status":0, > > > > > "QTime":1, > > > > > "params":{ > > > > > "q":"popularity:(1 OR 7)", > > > > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > > > > "rows":"100"}}, > > > > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > > > > { > > > > > "price":74.99, > > > > > "scale(sum(price,query($q)), 0, 1)":0.099437736}, > > > > > { > > > > > "price":19.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.013234352}, > > > > > { > > > > > "price":11.5, > > > > > "scale(sum(price,query($q)), 0, 1)":0.0}, > > > > > { > > > > > "price":329.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.49875492}, > > > > > { > > > > > "price":479.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.7336842}, > > > > > { > > > > > "price":649.99, > > > > > "scale(sum(price,query($q)), 0, 1)":1.0}] > > > > > }} > > > > > > > > > > but moving the filter in fq: > > > > > > > > > > > > > > > > > > > > > > > > > http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(sum(price,query($q)),%200,%201) > > > > > > > > > > { > > > > > "responseHeader":{ > > > > > "status":0, > > > > > "QTime":8, > > > > > "params":{ > > > > > "q":"*:*", > > > > > "fl":"price,scale(sum(price,query($q)), 0, 1)", > > > > > "fq":"popularity:(1 OR 7)", > > > > > "rows":"100"}}, > > > > > "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[ > > > > > { > > > > > "price":74.99, > > > > > "scale(sum(price,query($q)), 0, 1)":0.034101862}, > > > > > { > > > > > "price":19.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.009072306}, > > > > > { > > > > > "price":11.5, > > > > > "scale(sum(price,query($q)), 0, 1)":0.0052296496}, > > > > > { > > > > > "price":329.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.15004548}, > > > > > { > > > > > "price":479.95, > > > > > "scale(sum(price,query($q)), 0, 1)":0.2182583}, > > > > > { > > > > > "price":649.99, > > > > > "scale(sum(price,query($q)), 0, 1)":0.29558435}] > > > > > }} > > > > > > > > > > > > > > > On the other hand, I was thinking of implementing a custom scale > > > function > > > > > that by default works only on the current result set and not on the > > > > entire > > > > > collection. > > > > > > > > > > Any suggestions on how to solve this problem? > > > > > > > > > > Best regards, > > > > > Vincenzo > > > > > > > > > > > > > > > -- > > > > > Vincenzo D'Amore > > > > > > > > > > > > > > > > > -- > > > > Sincerely yours > > > > Mikhail Khludnev > > > > > > > > > > > > > -- > > > Vincenzo D'Amore > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > -- > Vincenzo D'Amore > -- Sincerely yours Mikhail Khludnev
