Re: Facets based on sampling

2017-10-24 Thread Toke Eskildsen
John Davis wrote: > 100M unique values might be across all docs, and unless the faceting > implementation is really naive I cannot see how that can come into play > when the query matches a fraction of those. Solr simple string faceting uses an int-array to hold counts

Re: Facets based on sampling

2017-10-24 Thread John Davis
On Tue, Oct 24, 2017 at 8:37 AM, Erick Erickson wrote: > bq: It is a bit surprising why facet computation > is so slow even when the query matches hundreds of docs. > > The number of terms in the field over all docs also comes into play. > Say you're faceting over a

Re: Facets based on sampling

2017-10-24 Thread Erick Erickson
bq: It is a bit surprising why facet computation is so slow even when the query matches hundreds of docs. The number of terms in the field over all docs also comes into play. Say you're faceting over a field that has 100,000,000 unique values across all docs, that's a lot of bookkeeping. Best,

Re: Facets based on sampling

2017-10-24 Thread Emir Arnautović
Hi John, Did you mean “docValues don’t work for analysed fields” since it works for multivalue string (or other supported types) fields. What you need to do is to convert your analysed field to multivalue string field - that requires changes in indexing flow. HTH, Emir -- Monitoring - Log

Re: Facets based on sampling

2017-10-23 Thread John Davis
Docvalues don't work for multivalued fields. I just started a separate thread with more debug info. It is a bit surprising why facet computation is so slow even when the query matches hundreds of docs. On Mon, Oct 23, 2017 at 6:53 AM, alessandro.benedetti wrote: > Hi John,

Re: Facets based on sampling

2017-10-23 Thread alessandro.benedetti
Hi John, first of all, I may state the obvious, but have you tried docValues ? Apart from that a friend of mine ( Diego Ceccarelli) was discussing a probabilistic implementation similar to the hyperloglog[1] to approximate facets counting. I didn't have time to take a look in details / implement

Re: Facets based on sampling

2017-10-20 Thread John Davis
Hi Yonik, Any update on sampling based facets. The current faceting is really slow for fields with high cardinality even with method=uif. Or are there alternative work-arounds to only look at N docs when computing facets? On Fri, Nov 4, 2016 at 4:43 PM, Yonik Seeley <ysee...@gmail.com>

Re: Facets based on sampling

2016-11-05 Thread Mikhail Khludnev
Hello, John! You can try to do that manually by applying filter by random field. On Fri, Nov 4, 2016 at 10:02 PM, John Davis wrote: > Hi, > I am trying to improve the performance of queries with facets. I understand > that for queries with high facet cardinality and

Re: Facets based on sampling

2016-11-05 Thread Toke Eskildsen
From: John Davis wrote: > Does there exist an option to compute facets by just looking at the top-n > results instead of all of them or a sample of results based on some query > parameters? Doing it for the top-n results does not play well with the current query flow

Re: Facets based on sampling

2016-11-04 Thread Yonik Seeley
Sampling has been on my TODO list for the JSON Facet API. How much it would help depends on where the bottlenecks are, but that in conjunction with a hashing approach to collection (assuming field cardinality is high) should definitely help. -Yonik On Fri, Nov 4, 2016 at 3:02 PM, John Davis

Re: Facets based on sampling

2016-11-04 Thread Jeff Wartes
https://issues.apache.org/jira/browse/SOLR-5894 had some pretty interesting looking work on heuristic counts for facets, among other things. Unfortunately, it didn’t get picked up, but if you don’t mind using Solr 4.10, there’s a jar. On 11/4/16, 12:02 PM, "John Davis"

Re: Facets based on sampling

2016-11-04 Thread Alexandre Rafalovitch
I believe that's what's JSON facet API does by default. Have you tried that? Regards, Alex. Solr Example reading group is starting November 2016, join us at http://j.mp/SolrERG Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 5 November 2016

Facets based on sampling

2016-11-04 Thread John Davis
Hi, I am trying to improve the performance of queries with facets. I understand that for queries with high facet cardinality and large number results the current facet computation algorithms can be slow as they are trying to loop across all docs and facet values. Does there exist an option to