If you constrain random sample to fixed number instead of percentage , reservoir sampling can be used without even calculating the total match count. this can be done on client side. you could stop sampling after a max e.g 10 million.
> On Sep 28, 2016, at 10:15 AM, Pushkar Raste <pushkar.ra...@gmail.com> wrote: > > Purely of algorithmic point of view - look into reservoir sampling for > unbiased sampling. > > On Sep 28, 2016 11:00 AM, "Yongtao Liu" <y...@commvault.com> wrote: > > Alexandre, > > Thanks for reply. > The use case is customer want to review document based on search result. > But they do not want to review all, since it is costly. > So, they want to pick partial (from 1% to 100%) document to review. > For statistics, user also ask this function. > It is kind of common requirement > Do you know any plan to implement this feature in future? > > Post filter should work. Like collapsing query parser. > > Thanks, > Yongtao > -----Original Message----- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Tuesday, September 27, 2016 9:25 PM > To: solr-user > Subject: Re: how to sampling search result > > I am not sure I understand what the business case is. However, you might be > able to do something with a custom post-filter. > > Regards, > Alex. > ---- > Newsletter and resources for Solr beginners and intermediates: > http://www.solr-start.com/ > > >> On 27 September 2016 at 22:29, Yongtao Liu <y...@commvault.com> wrote: >> Mikhail, >> >> Thanks for your reply. >> >> Random field is based on index time. >> We want to do sampling based on search result. >> >> Like if the random field has value 1 - 100. >> And the query touched documents may all in range 90 - 100. >> So random field will not help. >> >> Is it possible we can sampling based on search result? >> >> Thanks, >> Yongtao >> -----Original Message----- >> From: Mikhail Khludnev [mailto:m...@apache.org] >> Sent: Tuesday, September 27, 2016 11:16 AM >> To: solr-user >> Subject: Re: how to sampling search result >> >> Perhaps, you can apply a filter on random field. >> >>> On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote: >>> >>> Hi, >>> >>> Is it possible I can sampling based on "search result"? >>> Like run query first, and search result return 1 million documents. >>> With random sampling, 50% (500K) documents return for facet, and stats. >>> >>> The sampling need based on "search result". >>> >>> Thanks, >>> Yongtao >>> >>> >>> >>> -- >>> View this message in context: http://lucene.472066.n3. >>> nabble.com/how-to-sampling-search-result-tp4298269.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev