Hi I am migrating from master slave to Solr Cloud but I'm running into problems with indexing.
Cluster details: 8 machines of 64GB memory, each hosting 1 replica. 4 shards, 2 replica of each. Heap size is 16GB. Collection details: Total number of docs: ~250k (but only 50k are indexed right now) Size of collection (master slave number for reference): ~10GB Our collection is fairly heavy with some dynamic fields with high cardinality (of order of ~1000s), which is why the large heap size for even a small collection. Relevant solrconfig settings: commit settings: <autoCommit> <maxDocs>10000</maxDocs> <maxTime>3600000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:1800000}</maxTime> </autoSoftCommit> index config: <ramBufferSizeMB>500</ramBufferSizeMB> <maxBufferedDocs>10000</maxBufferedDocs> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicyFactory> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">6</int> <int name="maxThreadCount">4</int> </mergeScheduler> Problem: I setup the cloud and started indexing at the throughput of our earlier master-slave setup, but soon the machines ran into full blown Garbage Collection. This throughput was not a lot though. We index the whole collection overnight, so roughly ~250k documents in 6 hours. That's roughly 12rps. So now I'm doing indexing at an extremely slow rate trying to find the problem. Currently I'm indexing at 1 document/2seconds, so every minute ~30 documents. Observations: 1. I'm noticing extremely small segments in the segments UI. Example: Segment _1h4: #docs: 5 #dels: 0 size: 1,586,878 bytes age: 2021-02-12T11:05:33.050Z source: flush Why is lucene creating such small segments? I understood that segments are created when ramBufferSizeMB or maxBufferedDocs limit is hit. Or on a hard commit. Neither of those should lead to such small segments. 2. The index/ directory has a large number of files. For one shard with 30k documents & 1.5GB size, there are ~450-550 files in this directory. I understand that each segment is composed of a bunch of files. Even accounting for that, the number of segments seems very large. Note: Nothing out of the ordinary in logs. Only /update request logs. Please help with making sense of the 2 observations above. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html