hello,
I’m facing some performance issues when moving from NRT replica types to TLOG +
PULL. We’re constantly indexing new data and heavily querying (~2k rps).
- index size is ~ 2.5Gi;
- number of docs ~4.6M;
- 2 shards;
- 7 cores and 14Gi of memory
- 30 instances
- JVM Heap is 12Gi
When running on NRT only, the response time in avg is ~150ms p99 and 40ms p95.
When changing to TLOG (6 tlog replicas) + 24 PULL, the response time grows to
~350ms p99 and 120ms p95.
Here are some fragments from our solrconfig:
> <updateHandler class="solr.DirectUpdateHandler2">
> <updateLog>
> <str name="dir">${solr.data.dir:}</str>
> <int
> name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int>
> </updateLog>
>
> <autoCommit>
> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> <maxDocs>${solr.autoCommit.maxDocs:10000}</maxDocs>
> <openSearcher>true</openSearcher>
> </autoCommit>
>
> <autoSoftCommit>
> <maxTime>${solr.autoSoftCommit.maxTime:300000}</maxTime>
> </autoSoftCommit>
> </updateHandler>
> <query>
> <maxBooleanClauses>1000</maxBooleanClauses>
> <filterCache class="solr.CaffeineCache"
> size="${filterCache.size:32768}"
> initialSize="${filterCache.initialSize:32768}"
> autowarmCount="20%"/>
>
> <queryResultCache class="solr.CaffeineCache"
> size="${queryResultCache.size:32768}"
> initialSize="${queryResultCache.initialSize:32768}"
> autowarmCount="0%"/>
>
> <documentCache class="solr.CaffeineCache"
> size="${documentCache.size:150000}"
> initialSize="${documentCache.initialSize:150000}"
> autowarmCount="0%"/>
>
> <enableLazyFieldLoading>true</enableLazyFieldLoading>
> <useFilterForSortedQuery>true</useFilterForSortedQuery>
>
> <queryResultWindowSize>160</queryResultWindowSize>
> <queryResultMaxDocsCached>300</queryResultMaxDocsCached>
>
> <listener event="newSearcher" class="solr.QuerySenderListener">
> </listener>
> <listener event="firstSearcher" class="solr.QuerySenderListener">
> </listener>
>
> <useColdSearcher>false</useColdSearcher>
> <maxWarmingSearchers>8</maxWarmingSearchers>
> </query>
One of my assumption was to reduce the maxWarmingSearchers and to increase the
autoCommit maxTime, since the softCommit isn’t available anymore in TLOG
replicas. Is that valid?
I couldn’t find any documents with the differences/considerations we need to
take into account between NRT and TLOG, could you please help? Thanks a lot in
advance. Please let me know if there is anything else required.
Best regards,
Nick Vladiceanu