multivalue faceting term optimization

Nicolas Paris Mon, 09 Mar 2020 02:39:52 -0700

Hello,


Environment:
- SolrCloud 8.4.1
- 4 shards with xmx = 120GO and ssd disks
- 50M documents / 40GO physical per shard
- mainly large texts fields and also, one multivalue/docvalue/indexed string
list of 15 values per document


Goal:
I want to provide terms facet on a string multivalue field. This offers
the client to build dynamic word cloud depending on filter queries. The
words within the array are extracted with TFIDF from large raw texts
from neightbourg fields.


Behavior:
The computing time is below 2 seconds when the FQ query is selective
enough (<2M). However it results as a timeout when the FQ finds > 2M documents


Question:
How to improve brute performances ?
I tried:
- facet.limit / facet.threads / facet.method
- limiting the multivalue size (from 50 elements per documents to 15)
Is ther any parameter I am missing ?


Thought:
If there is now way to faster performances for the brute task, I guess I
could artificially limit the FQ under 2M for all queries by getting a
sample (I don't really care having more than 2M documents to build the
word cloud).
I am wondering how I could filter the documents to get approximate facets ?


Thanks !


-- 
nicolas paris

multivalue faceting term optimization

Reply via email to