Hi, I bench-marked the slow stats queries (6 point estimate) using the same hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which returns only the sum and count for statistics component results. Solr/Lucene is run on jetty.
The relationship between query time and set of found documents is linear when using the stats component (R^2 0.99). I guess this is expected as the application needs to scan/sum-up the stat field for all matching documents? Are there any plans for caching stat results for a certain stat field along with the documents that match a filter query ? Any other ideas that could help to improve this (hardware/software configuration) ? Even for a subset of 10M entries, the stat search takes on the order of 10 seconds. Thanks in advance. Johannes 2011/4/18 Johannes Goll <johannes.g...@gmail.com> > any ideas why in this case the stats summaries are so slow ? Thank you > very much in advance for any ideas/suggestions. Johannes > > > 2011/4/5 Johannes Goll <johannes.g...@gmail.com> > >> Hi, >> >> thank you for making the new apache-solr-3.1 available. >> >> I have installed the version from >> >> http://apache.tradebit.com/pub//lucene/solr/3.1.0/ >> >> and am running into very slow stats component queries (~ 1 minute) >> for fetching the computed sum of the stats field >> >> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight >> >> <int name="QTime">52825</int> >> >> #documents: 78,359,699 >> total RAM: 256G >> vm arguments: -server -xmx40G >> >> the stats.field specification is as follows: >> <field name="weight" type="pfloat" indexed="true" >> stored="false" required="true" multiValued="false" >> default="1"/> >> >> filter queries that narrow down the #docs help to reduce it - >> QTime seems to be proportional to the number of docs being returned >> by a filter query. >> >> Is there any way to improve the performance of such stats queries ? >> Caching only helped to improve the filter query performance but if >> larger subsets are being returned, QTime increases unacceptably. >> >> Since I only need the sum and not the STD or sumsOfSquares/Min/Max, >> I have created a custom 3.1 version that does only return the sum. But >> this >> only slightly improved the performance. Of course I could somehow cache >> the larger sum queries on the client side but I want to do this only as a >> last resort. >> >> Thank you very much in advance for any ideas/suggestions. >> >> Johannes >> >> > > > -- > Johannes Goll > 211 Curry Ford Lane > Gaithersburg, Maryland 20878 >