Have you looked at the HyperLogLog stuff? Here’s at least a mention of it: 
https://lucene.apache.org/solr/guide/8_4/the-stats-component.html



Best,
Erick

> On Mar 9, 2020, at 02:39, Nicolas Paris <nicolas.pa...@riseup.net> wrote:
> 
> Hello,
> 
> 
> Environment:
> - SolrCloud 8.4.1
> - 4 shards with xmx = 120GO and ssd disks
> - 50M documents / 40GO physical per shard
> - mainly large texts fields and also, one multivalue/docvalue/indexed string
> list of 15 values per document
> 
> 
> Goal:
> I want to provide terms facet on a string multivalue field. This offers
> the client to build dynamic word cloud depending on filter queries. The
> words within the array are extracted with TFIDF from large raw texts
> from neightbourg fields.
> 
> 
> Behavior:
> The computing time is below 2 seconds when the FQ query is selective
> enough (<2M). However it results as a timeout when the FQ finds > 2M documents
> 
> 
> Question:
> How to improve brute performances ?
> I tried:
> - facet.limit / facet.threads / facet.method
> - limiting the multivalue size (from 50 elements per documents to 15)
> Is ther any parameter I am missing ?
> 
> 
> Thought:
> If there is now way to faster performances for the brute task, I guess I
> could artificially limit the FQ under 2M for all queries by getting a
> sample (I don't really care having more than 2M documents to build the
> word cloud).
> I am wondering how I could filter the documents to get approximate facets ?
> 
> 
> Thanks !
> 
> 
> -- 
> nicolas paris

Reply via email to