First off, even 6 ZK instances are overkill, vast overkill. 3 should be
more than enough.

That aside, however, how are you letting your Solr nodes know about the zk
machines?
Is it possible you've pointed some of your Solr nodes at specific ZK
machines
that aren't up when you have this problem? I.e. -zkHost=zk1,zk2,zk3....

Best
Erick


On Tue, Aug 6, 2013 at 4:56 PM, Joshi, Shital <shital.jo...@gs.com> wrote:

> Hi,
>
> We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes.
> We have 6 zookeeper instances. We are planning to change to odd number of
> zookeeper instances.
>
> With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node
> never connects to zookeeper (can't see the admin page) until all zookeeper
> instances are up and we restart all solr nodes. It was suggested that it
> could be due this bug https://issues.apache.org/jira/browse/SOLR-4899 and
> this bug is solved in Solr 4.4
>
> We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6
> zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing
> this exception in Solr logs:
>
> 751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session
> 0x0 for server null, unexpected error, closing socket connection and
> attempting reconnect java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>         at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>
> And after a while saw this exception.
>
> INFO  - 2013-08-05 22:24:07.582;
> org.apache.solr.common.cloud.ConnectionManager; Watcher
> org.apache.solr.common.cloud.ConnectionManager@5140709name:ZooKeeperConnection
>  Watcher:
> qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,
> qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got
> event WatchedEvent state:SyncConnected type:None path:null path:null
> type:None
> INFO  - 2013-08-05 22:24:07.662;
> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status
> change trigger but we are already closed
> 754311 [main-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  ? Client->ZooKeeper status
> change trigger but we are already closed
>
> We brought up all zookeeper instances but the cloud never came up until
> all solr nodes were restarted. Do we need to change any settings? After
> weekend reboot, all zookeeper instances come up one by one. While zookeeper
> instances are coming up solr nodes are also getting started. With this
> issue, we have to put checks to make sure all zookeeper instances are up
> before we bring up any solr node.
>
> Thanks!!
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Tuesday, June 11, 2013 10:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: external zookeeper with SolrCloud
>
>
> On Jun 11, 2013, at 10:15 AM, "Joshi, Shital" <shital.jo...@gs.com> wrote:
>
> > Thanks Mark.
> >
> > Looks like this bug is fixed in Solr 4.4. Do you have any date for
> official release of 4.4?
>
> Looks like it might come out in a couple of weeks.
>
> > Is there any instruction available on how to build Solr 4.4 from SVN
> repository?
>
> It's java, so it's pretty easy - you might find some help here:
> http://wiki.apache.org/solr/HowToContribute
>
> - Mark
>
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmil...@gmail.com]
> > Sent: Monday, June 10, 2013 8:05 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: external zookeeper with SolrCloud
> >
> > This might be https://issues.apache.org/jira/browse/SOLR-4899
> >
> > - Mark
> >
> > On Jun 10, 2013, at 5:59 PM, "Joshi, Shital" <shital.jo...@gs.com>
> wrote:
> >
> >> Hi,
> >>
> >>
> >>
> >> We're setting up 5 shard SolrCloud with external zoo keeper. When we
> bring up Solr nodes while the zookeeper instance is not up and running, we
> see this error in Solr logs.
> >>
> >>
> >>
> >> java.net.ConnectException: Connection refused
> >>
> >>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>
> >>       at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >>
> >>       at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
> >>
> >>       at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> >>
> >>
> >>
> >> INFO  - 2013-06-10 15:03:35.422;
> org.apache.solr.common.cloud.ConnectionManager; Watcher 592147
> [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ?
> Watcher 
> org.apache.solr.common.cloud.ConnectionManager@530d0eaename:ZooKeeperConnection
>  Watcher: ................. got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
> >>
> >>
> >>
> >> INFO  - 2013-06-10 15:03:35.423;
> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status
> change trigger but we are already closed
> >>
> >> 592148 [main-EventThread] INFO
>  org.apache.solr.common.cloud.ConnectionManager  ? Client->ZooKeeper status
> change trigger but we are already closed
> >>
> >>
> >>
> >> After we bring up zookeeper instance, the node never connects to
> zookeeper and we can't see the solr admin page, until we restart the node.
> >>
> >>
> >>
> >> Does the zookeeper instance has to be up when we bring up Solr node?
> That's not what the documentation say though.
> >>
> >>
> >>
> >> Thanks.
> >
>
>

Reply via email to