Re: Facets based on sampling

Toke Eskildsen Sat, 05 Nov 2016 09:11:28 -0700

From: John Davis <johndavis925...@gmail.com> wrote:
> Does there exist an option to compute facets by just looking at the top-n
> results instead of all of them or a sample of results based on some query
> parameters?


Doing it for the top-n results does not play well with the current query flow 
in Solr (I might be wrong here, as I am not too familiar with that part of the 
code). It also seems to collide somewhat as documents are (often) sorted by 
score and facets are (often) sorted by count. So the result would be something 
like facet-values present in the highest scoring documents and also being in 
many documents? It might work in some situations, but be confusing in others.

Sampling based faceting seems like a more straight-forward concept to me.

> I couldn't find one and if it does not exist, has this come up
> before? This would definitely not be a precise facet count but using
> reasonable sampling algorithms we should be able to extrapolate well.

I implemented something like that 2 years ago for Solr 4.10. There is a 
write-up at
https://sbdevel.wordpress.com/2015/06/19/dubious-guesses-counted-correctly/

Interestingly enough, it is possible to get precise counts with sampling. Then 
the "only" downside is a possibility that the terms guessed to be in the top-X 
are not the correct ones.

(and yes, I do plan on porting to Solr 6.x. Hopefully spring 2017, but no 
promises)

- Toke Eskildsen

Re: Facets based on sampling

Reply via email to