Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

Dominique Bejean Mon, 10 Aug 2020 09:08:29 -0700

Doss,

See below.


Dominique


Le lun. 10 août 2020 à 17:41, Doss <itsmed...@gmail.com> a écrit :

> Hi Dominique,
>
> Thanks for your response. Find below the details, please do let me know if
> anything I missed.
>
>
> *- hardware architecture and sizing*
> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>
>
> *- JVM version / settings    *
> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
> Default Settings including GC
>

I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is the
best choice for LTS version.


>
> *- Solr settings    *
> >> softCommit: 15000 (15 sec), autoCommit: 300000 (5 mins)
> <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory"><int
> name="maxMergeAtOnce">30</int> <int name="maxMergeAtOnceExplicit">100</int>
> <double name="segmentsPerTier">30.0</double> </mergePolicyFactory>
>
>           <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler"><int
> name="maxMergeCount">18</int><int
> name="maxThreadCount">6</int></mergeScheduler>
>

You change a lot of default values. Any specific raisons ? Il seems very
aggressive !


>
>
> *- collections and queries information   *
> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
> selects not yet started. Daily once we do delta import of cetrain fields of
> type multivalued with some good amount of data.
>
> *- gc logs or gceasy results*
>
> Easy GC Report says GC health is good, one server's gc report:
> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
> CPU Load Pattern:
> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>
>
You have to analyze GC on all nodes !
Your heap is very big. According to full GC frequency, I don't think you
really need such a big heap for only indexing. May be when you will perform
queries.

Did you check your network performances ?
Did you check Zookeeper logs ?


>
> Thanks,
> Doss.
>
>
>
> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi Doss,
>>
>> See a lot of TIMED_WATING connection occurs with high tcp traffic
>> infrastructure as in a LAMP solution when the Apache server can't
>> anymore connect to the MySQL/MariaDB database.
>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>> This
>> is well explained in this great article
>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>
>> However, in general and more specifically in your case, I would
>> investigate
>> the root cause of your issue and do not try to find a workaround.
>>
>> Can you provide more information about your use case (we know : 3 node
>> SOLR
>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>
>>    - hardware architecture and sizing
>>    - JVM version / settings
>>    - Solr settings
>>    - collections and queries information
>>    - gc logs or gceasy results
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>> Le lun. 10 août 2020 à 15:43, Doss <itsmed...@gmail.com> a écrit :
>>
>> > Hi,
>> >
>> > In solr 8.3.1 source, I see the following , which I assume could be the
>> > reason for the issue "Max requests queued per destination 3000 exceeded
>> for
>> > HttpDestination",
>> >
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >    private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >      available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >      return MAX_OUTSTANDING_REQUESTS * 3;
>> >
>> > how can I increase this?
>> >
>> > On Mon, Aug 10, 2020 at 12:01 AM Doss <itsmed...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
>> and
>> > > then we are facing "Max requests queued per destination 3000 exceeded
>> for
>> > > HttpDestination"
>> > >
>> > > After restart evering thing starts working fine until another problem.
>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>> > >
>> > > Server 1:
>> > >    *7722*  Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
>> > > ")
>> > > Server 2:
>> > >    *4046*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
>> > > ")
>> > > Server 3:
>> > >    *4210*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
>> > > ")
>> > >
>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
>> we
>> > > increase the 3000 limit?
>> > >
>> > > Sorry, since I haven't got any response to my previous query,  I am
>> > > creating this as new,
>> > >
>> > > Thanks,
>> > > Mohandoss.
>> > >
>> >
>>
>

Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

Reply via email to