Re: multivalue faceting term optimization

2020-03-09 Thread Jörn Franke
hll stands for https://en.wikipedia.org/wiki/HyperLogLog You will not get the exact distinct count, but a distinct count very close to the real number. It is very fast and memory efficient for large number of distinct values. > Am 10.03.2020 um 00:25 schrieb Nicolas Paris : > >  > Erick Erick

Re: multivalue faceting term optimization

2020-03-09 Thread Nicolas Paris
Erick Erickson writes: > Have you looked at the HyperLogLog stuff? Here’s at least a mention of > it: https://lucene.apache.org/solr/guide/8_4/the-stats-component.html I am used to hll in the context of count distinct values -- cardinality. I have to admit that section https://lucene.apache.o

Re: multivalue faceting term optimization

2020-03-09 Thread Nicolas Paris
Toke Eskildsen writes: > JSON faceting allows you to skip the fine counting with the parameter > refine: I also tried the facet.refine parameter, but didn't notice any improvement. >> I am wondering how I could filter the documents to get approximate >> facets ? > > Clunky idea: Introduce a

Re: multivalue faceting term optimization

2020-03-09 Thread Erick Erickson
Have you looked at the HyperLogLog stuff? Here’s at least a mention of it: https://lucene.apache.org/solr/guide/8_4/the-stats-component.html Best, Erick > On Mar 9, 2020, at 02:39, Nicolas Paris wrote: > > Hello, > > > Environment: > - SolrCloud 8.4.1 > - 4 shards with xmx = 120GO and ssd

Re: multivalue faceting term optimization

2020-03-09 Thread Toke Eskildsen
On Mon, 2020-03-09 at 10:39 +0100, Nicolas Paris wrote: > I want to provide terms facet on a string multivalue field. > ... > How to improve brute performances ? It might help to have everything in a single shard, to avoid the secondary fine count. But your index is rather large for single-shard s

multivalue faceting term optimization

2020-03-09 Thread Nicolas Paris
Hello, Environment: - SolrCloud 8.4.1 - 4 shards with xmx = 120GO and ssd disks - 50M documents / 40GO physical per shard - mainly large texts fields and also, one multivalue/docvalue/indexed string list of 15 values per document Goal: I want to provide terms facet on a string multivalue field.