Extremely small segments

Yasoob Haider Fri, 12 Feb 2021 04:43:10 -0800

Hi

I am migrating from master slave to Solr Cloud but I'm running into problems
with indexing.


Cluster details:

8 machines of 64GB memory, each hosting 1 replica.
4 shards, 2 replica of each. Heap size is 16GB.

Collection details:

Total number of docs: ~250k (but only 50k are indexed right now)
Size of collection (master slave number for reference): ~10GB

Our collection is fairly heavy with some dynamic fields with high
cardinality (of order of ~1000s), which is why the large heap size for even
a small collection.

Relevant solrconfig settings:

commit settings:

    <autoCommit>
      <maxDocs>10000</maxDocs>
      <maxTime>3600000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:1800000}</maxTime>
    </autoSoftCommit>

index config:

    <ramBufferSizeMB>500</ramBufferSizeMB>
    <maxBufferedDocs>10000</maxBufferedDocs>

        <mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>


       <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
         <int name="maxMergeCount">6</int>
         <int name="maxThreadCount">4</int>
       </mergeScheduler>


Problem:

I setup the cloud and started indexing at the throughput of our earlier
master-slave setup, but soon the machines ran into full blown Garbage
Collection. This throughput was not a lot though. We index the whole
collection overnight, so roughly ~250k documents in 6 hours. That's roughly
12rps.

So now I'm doing indexing at an extremely slow rate trying to find the
problem.

Currently I'm indexing at 1 document/2seconds, so every minute ~30
documents.

Observations:

1. I'm noticing extremely small segments in the segments UI. Example:

Segment _1h4:
#docs: 5
#dels: 0
size: 1,586,878 bytes
age: 2021-02-12T11:05:33.050Z
source: flush

Why is lucene creating such small segments? I understood that segments are
created when ramBufferSizeMB or maxBufferedDocs limit is hit. Or on a hard
commit. Neither of those should lead to such small segments.

2. The index/ directory has a large number of files. For one shard with 30k
documents & 1.5GB size, there are ~450-550 files in this directory. I
understand that each segment is composed of a bunch of files. Even
accounting for that, the number of segments seems very large.

Note: Nothing out of the ordinary in logs. Only /update request logs.

Please help with making sense of the 2 observations above.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Extremely small segments

Reply via email to