Thank you for your responses. Please find additional details about the
setup below:

We are using Solr 7.2.1

> I have a solrcloud setup on Windows server with below config:
> 3 nodes,
> 24 shards with replication factor 2
> Each node hosts 16 cores.

16 CPU cores, or 16 Solr cores?  The info may not be all that useful
either way, but just in case, it should be clarified.

*OP Reply:* 16 Solr cores (i.e. replicas)

> Index size is 1.4 TB per node
> Xms 8 GB , Xmx 24 GB
> Directory factory used is SimpleFSDirectoryFactory

How much total memory in the server?  Is there other software using
significant levels of memory?

*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?

> How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space

> Can you share the GC log that Solr writes?
*OP Reply:*  Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw

Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.

Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
        "class":"org.apache.lucene.index.ConcurrentMergeScheduler",
        "maxMergeCount":2,
        "maxThreadCount":2},

Thanks,
Rahul


On Wed, Jun 5, 2019 at 4:24 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 6/5/2019 9:39 AM, Rahul Goswami wrote:
> > I have a solrcloud setup on Windows server with below config:
> > 3 nodes,
> > 24 shards with replication factor 2
> > Each node hosts 16 cores.
>
> 16 CPU cores, or 16 Solr cores?  The info may not be all that useful
> either way, but just in case, it should be clarified.
>
> > Index size is 1.4 TB per node
> > Xms 8 GB , Xmx 24 GB
> > Directory factory used is SimpleFSDirectoryFactory
>
> How much total memory in the server?  Is there other software using
> significant levels of memory?
>
> Why did you opt to change the DirectoryFactory away from Solr's default?
>   The default is chosen with care ... any other choice will probably
> result in lower performance.  The default in recent versions of Solr is
> NRTCachingDirectoryFactory, which uses MMap for file access.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> The screenshot described here might become useful for more in-depth
> troubleshooting:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows
>
> How many total documents (maxDoc, not numDoc) are in that 1.4 TB of space?
>
> > The cloud is all nice and green for the most part. Only when we start
> > indexing, within a few seconds, I start seeing Read timeouts and socket
> > write errors and replica recoveries thereafter. We are indexing in 2
> > parallel threads in batches of 50 docs per update request. After
> examining
> > the thread dump, I see segment merges happening. My understanding is that
> > this is the cause, and the timeouts and recoveries are the symptoms. Is
> my
> > understanding correct? If yes, what steps could I take to help the
> > situation. I do see that the difference between "Num Docs" and "Max Docs"
> > is about 20%.
>
> Segment merges are a completely normal part of Lucene's internal
> operation.  They should never cause problems like you have described.
>
> My best guess is that a 24GB heap is too small.  Or possibly WAY too
> large, although with the index size you have mentioned, that seems
> unlikely.
>
> Can you share the GC log that Solr writes?  The problem should occur
> during the timeframe covered by the log, and the log should be as large
> as possible.  You'll need to use a file sharing site -- attaching it to
> an email is not going to work.
>
> What version of Solr?
>
> Thanks,
> Shawn
>

Reply via email to