> For the sessionexpiredexception, the solr is throwing this exception and
> then the shard goes down.
>
> From the following discussion, it seems to be that the solr is loosing
> connection to zookeeper and throws the exception. In the zoo keeper
> configuration file, zoo.cfg, is it safe to increase the synclimit shown in
> below snippet.
>
>
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> # do not use /tmp for storage, /tmp here is just
> # example sakes.
> dataDir=/sanfs/mnt/vol01/solr/zookeeperdata/2
> # the port at which the clients will connect
> clientPort=2181
> # the maximum number of client connections.
> # increase this if you need to handle more clients
> #maxClientCnxns=60
>
> Thanks,
> Satya
>
> On Mon, May 8, 2017 at 12:04 PM Satya Marivada <satya.chaita...@gmail.com>
> wrote:
>
>> The 3g memory is doing well, performing a gc at 600-700 MB.
>>
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>
>> Here are my jvm start up
>>
>> The start up parameters are:
>>
>> java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4
>> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
>> -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
>> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
>> -XX:-OmitStackTraceInFastThrow -verbose:gc -XX:+PrintHeapAtGC
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>> -Xloggc:/sanfs/mnt/vol01/solr/solr-6.3.0/server/logs/solr_gc.log
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
>> -DzkClientTimeout=15000 .......
>>
>> On Mon, May 8, 2017 at 11:50 AM Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>
>>> Which garbage collector are you using? The default GC will probably give
>>> long pauses.
>>>
>>> You need to use CMS or G1.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>> > On May 8, 2017, at 8:48 AM, Erick Erickson <erickerick...@gmail.com>
>>> wrote:
>>> >
>>> > 3G of memory should not lead to long GC pauses unless you're running
>>> > very close to the edge of available memory. Paradoxically, running
>>> > with 6G of memory may lead to _fewer_ noticeable pauses since the
>>> > background threads can do the work, well, in the background.
>>> >
>>> > Best,
>>> > Erick
>>> >
>>> > On Mon, May 8, 2017 at 7:29 AM, Satya Marivada
>>> > <satya.chaita...@gmail.com> wrote:
>>> >> Hi Piyush and Shawn,
>>> >>
>>> >> May I ask what is the solution for it, if it is the long gc pauses? I
>>> am
>>> >> skeptical about the same problem in our case too. We have started
>>> with 3G
>>> >> of memory for the heap.
>>> >> Did you have to adjust some of the memory allotted? Very much
>>> appreciated.
>>> >>
>>> >> Thanks,
>>> >> Satya
>>> >>
>>> >> On Sat, May 6, 2017 at 12:36 PM Piyush Kunal <piyush.ku...@myntra.com
>>> >
>>> >> wrote:
>>> >>
>>> >>> We already faced this issue and found out the issue to be long GC
>>> pauses
>>> >>> itself on either client side or server side.
>>> >>> Regards,
>>> >>> Piyush
>>> >>>
>>> >>> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey <apa...@elyograg.org>
>>> wrote:
>>> >>>
>>> >>>> On 5/3/2017 7:32 AM, Satya Marivada wrote:
>>> >>>>> I see below exceptions in my logs sometimes. What could be causing
>>> it?
>>> >>>>>
>>> >>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>> >>>>
>>> >>>> Based on my limited research, this would tend to indicate that the
>>> >>>> heartbeats ZK uses to detect when sessions have gone inactive are
>>> not
>>> >>>> occurring in a timely fashion.
>>> >>>>
>>> >>>> Common causes seem to be:
>>> >>>>
>>> >>>> JVM Garbage collections.  These can cause the entire JVM to pause
>>> for an
>>> >>>> extended period of time, and this time may exceed the configured
>>> >>> timeouts.
>>> >>>>
>>> >>>> Excess client connections to ZK.  ZK limits the number of
>>> connections
>>> >>>> from each client address, with the idea of preventing denial of
>>> service
>>> >>>> attacks.  If a client is misbehaving, it may make more connections
>>> than
>>> >>>> it should.  You can try increasing the limit in the ZK config, but
>>> if
>>> >>>> this is the reason for the exception, then something's probably
>>> wrong,
>>> >>>> and you may be just hiding the real problem.
>>> >>>>
>>> >>>> Although we might have bugs causing the second situation, the first
>>> >>>> situation seems more likely.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Shawn
>>> >>>>
>>> >>>>
>>> >>>
>>>
>>>

Reply via email to