Re: [SoldCloud] Slow indexing

Markus Jelsma Mon, 05 Mar 2012 01:40:05 -0800

On Sun, 4 Mar 2012 21:09:30 -0500, Mark Miller <markrmil...@gmail.com>wrote:

On Mar 4, 2012, at 5:43 PM, Markus Jelsma wrote:

everything stalls after it lists all segment files and that a ZKstate change has occured.


Can you get a stack trace here? I'll try to respond to more tomorrow.
What version of trunk are you using? We have been making fixes and
improvements all the time, so need to get a frame of reference.

I updated trunk this saterday, march 3rd. The stack traces i providedis all i got. This point of restart and stalling does not produce astack trace at all. This is the final part of the info log:

[lots of segment files]...fdt, _135.fdt, _199_nrm.cfs, _18s.tvd,_zm.fdx, _18s.tvf, _196_0.frq, _135.tvf, _195.fdt, _135.tvd, _18n.tvf,_18n.tvd, _18y_0.tim, _18s.tvx, _zm.fnm, _187.tvx, _10g.fnm, _13t.per,_195.fdx]2012-03-04 22:39:15,061 INFO [solr.core.SolrCore] -[recoveryExecutor-2-thread-1] - : newest commit = 312012-03-0422:39:16,052 INFO [common.cloud.ZkStateReader] - [main-EventThread] - :A cluster state change has occurred2012-03-04 22:39:16,585 INFO[common.cloud.ZkStateReader] - [main-EventThread] - : A cluster statechange has occurred2012-03-04 22:39:36,652 INFO[common.cloud.ZkStateReader] - [main-EventThread] - : A cluster statechange has occurred2012-03-04 22:39:52,220 INFO[common.cloud.ZkStateReader] - [main-EventThread] - : A cluster statechange has occurred


When a client node cannot talk to zookeeper, because it may not know
certain things it should (what if a leader changes?), it must reject
updates (searches will still work). Why can't the node talk to
zookeeper? Perhaps the load is so high on the server, it cannot
respond to zk within the session timeout? I really don't know yet.
When this happens though, it forces a recovery when/if the node can
reconnect to zookeeper.


Sounds likely. There are a lot of time outs in ZK's log such as :

EndOfStreamException: Unable to read additional data from clientsessionid 0x135dfcfda000012, likely client has closed socketatorg.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)atorg.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)

        at java.lang.Thread.run(Thread.java:662)

2012-03-04 22:37:38,956 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closedsocket connection for

 client /141.105.120.151:49833 which had sessionid 0x135dfcfda000012

2012-03-04 22:37:39,077 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caughtend of stream exceptionEndOfStreamException: Unable to read additional data from clientsessionid 0x135dfcfda000010, likely client has closed socketatorg.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)atorg.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)

        at java.lang.Thread.run(Thread.java:662)

2012-03-04 22:37:39,077 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closedsocket connection for client /141.105.120.153:36794 which had sessionid0x135dfcfda0000102012-03-04 22:37:48,000 [myid:] - INFO[SessionTracker:ZooKeeperServer@334] - Expiring session0x135dfcfda000011, timeout of 10000ms exceeded

The problems seem to have a lot to do with ZK as we always see badmesssages in its log around the time Solr is going crazy.


We have not yet started on optimizing bulk indexing - currently an
update is added locally *before* sending updates in parallel to each
replica. Then we wait for each response before responding to the
client. We plan to offer more optimizations and options around this.

This is indeed a bit of a problem but at least it's indexing. Ifthere's any additional information you need or want us to pull in newcommits and try again we're happy to give it a shot.

Feed back will be useful in making some of these improvements.


- Mark Miller
lucidimagination.com


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: [SoldCloud] Slow indexing

Reply via email to