If you constrain random sample to fixed number instead of percentage , 
reservoir sampling can be used without even calculating the total match count. 
this can be done on client side. you could stop sampling after a max e.g 10 
million. 


> On Sep 28, 2016, at 10:15 AM, Pushkar Raste <pushkar.ra...@gmail.com> wrote:
> 
> Purely of algorithmic point of view - look into reservoir sampling for
> unbiased sampling.
> 
> On Sep 28, 2016 11:00 AM, "Yongtao Liu" <y...@commvault.com> wrote:
> 
> Alexandre,
> 
> Thanks for reply.
> The use case is customer want to review document based on search result.
> But they do not want to review all, since it is costly.
> So, they want to pick partial (from 1% to 100%) document to review.
> For statistics, user also ask this function.
> It is kind of common requirement
> Do you know any plan to implement this feature in future?
> 
> Post filter should work. Like collapsing query parser.
> 
> Thanks,
> Yongtao
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, September 27, 2016 9:25 PM
> To: solr-user
> Subject: Re: how to sampling search result
> 
> I am not sure I understand what the business case is. However, you might be
> able to do something with a custom post-filter.
> 
> Regards,
>   Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
> 
> 
>> On 27 September 2016 at 22:29, Yongtao Liu <y...@commvault.com> wrote:
>> Mikhail,
>> 
>> Thanks for your reply.
>> 
>> Random field is based on index time.
>> We want to do sampling based on search result.
>> 
>> Like if the random field has value 1 - 100.
>> And the query touched documents may all in range 90 - 100.
>> So random field will not help.
>> 
>> Is it possible we can sampling based on search result?
>> 
>> Thanks,
>> Yongtao
>> -----Original Message-----
>> From: Mikhail Khludnev [mailto:m...@apache.org]
>> Sent: Tuesday, September 27, 2016 11:16 AM
>> To: solr-user
>> Subject: Re: how to sampling search result
>> 
>> Perhaps, you can apply a filter on random field.
>> 
>>> On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Is it possible I can sampling based on  "search result"?
>>> Like run query first, and search result return 1 million documents.
>>> With random sampling, 50% (500K) documents return for facet, and stats.
>>> 
>>> The sampling need based on "search result".
>>> 
>>> Thanks,
>>> Yongtao
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/how-to-sampling-search-result-tp4298269.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev

Reply via email to