This is going to be a complex answer because Solr actually now has multiple ways of doing regression analysis as part of the Streaming Expression statistical programming library. The basic documentation is here:
https://lucene.apache.org/solr/guide/7_2/statistical-programming.html Here is a sample expression that performs a simple linear regression in Solr 7.2: let(a=random(collection1, q="any query", rows="15000", fl="fieldA, fieldB"), b=col(a, fieldA), c=col(a, fieldB), d=regress(b, c)) The expression above takes a random sample of 15000 results from collection1. The result set will include fieldA and fieldB in each record. The result set is stored in variable "a". Then the "col" function creates arrays of numbers from the results stored in variable a. The values in fieldA are stored in the variable "b". The values in fieldB are stored in variable "c". Then the regress function performs a simple linear regression on arrays stored in variables "b" and "c". The output of the regress function is a map containing the regression result. This result includes RSquared and other attributes of the regression model such as R (correlation), slope, y intercept etc... Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Feb 23, 2018 at 3:10 PM, John Smith <localde...@gmail.com> wrote: > Hi Joel, thanks for the answer. I'm not really a stats guy, but the end > result of all this is supposed to be obtaining R^2. Is there no way of > obtaining this value, then (short of iterating over all the results in the > hitlist and calculating it myself)? > > On Fri, Feb 23, 2018 at 12:26 PM, Joel Bernstein <joels...@gmail.com> > wrote: > > > Typically SSE is the sum of the squared errors of the prediction in a > > regression analysis. The stats component doesn't perform regression, > > although it might be a nice feature. > > > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Fri, Feb 23, 2018 at 12:17 PM, John Smith <localde...@gmail.com> > wrote: > > > > > I'm using solr, and enabling stats as per this page: > > > https://lucene.apache.org/solr/guide/6_6/the-stats-component.html > > > > > > I want to get more stat values though. Specifically I'm looking for > > > r-squared (coefficient of determination). This value is not present in > > > solr, however some of the pieces used to calculate r^2 are in the stats > > > element, for example: > > > > > > <double name="min">0.0</double> > > > <double name="max">10.0</double> > > > <long name="count">15</long> > > > <long name="missing">17</long> > > > <double name="sum">85.0</double> > > > <double name="sumOfSquares">603.0</double> > > > <double name="mean">5.666666666666667</double> > > > <double name="stddev">2.943920288775949</double> > > > > > > > > > So I have the sumOfSquares available (SST), and using this > calculation, I > > > can get R^2: > > > > > > R^2 = 1 - SSE/SST > > > > > > All I need then is SSE. Is there anyway I can get SSE from those other > > > stats in solr? > > > > > > Thanks in advance! > > > > > >