Have you looked at the HyperLogLog stuff? Here’s at least a mention of it: https://lucene.apache.org/solr/guide/8_4/the-stats-component.html
Best, Erick > On Mar 9, 2020, at 02:39, Nicolas Paris <nicolas.pa...@riseup.net> wrote: > > Hello, > > > Environment: > - SolrCloud 8.4.1 > - 4 shards with xmx = 120GO and ssd disks > - 50M documents / 40GO physical per shard > - mainly large texts fields and also, one multivalue/docvalue/indexed string > list of 15 values per document > > > Goal: > I want to provide terms facet on a string multivalue field. This offers > the client to build dynamic word cloud depending on filter queries. The > words within the array are extracted with TFIDF from large raw texts > from neightbourg fields. > > > Behavior: > The computing time is below 2 seconds when the FQ query is selective > enough (<2M). However it results as a timeout when the FQ finds > 2M documents > > > Question: > How to improve brute performances ? > I tried: > - facet.limit / facet.threads / facet.method > - limiting the multivalue size (from 50 elements per documents to 15) > Is ther any parameter I am missing ? > > > Thought: > If there is now way to faster performances for the brute task, I guess I > could artificially limit the FQ under 2M for all queries by getting a > sample (I don't really care having more than 2M documents to build the > word cloud). > I am wondering how I could filter the documents to get approximate facets ? > > > Thanks ! > > > -- > nicolas paris