Thanks Flavio.

Would you know why node2 could not receive ACK from the other 2 nodes .

What is the workaround in scenarios like these where in a 3 node cluster 1
node is not responding
** If we do a rolling restart there is a possiblity of a downtime
** Add 2 more nodes to the configs and do a rolling restart
** Could you think of any way to fix node 2 so that it rejoins the cluster.

Would appreciate your reply.



On Tue, Apr 12, 2016 at 1:33 AM, Flavio Junqueira <f...@apache.org> wrote:

> Good to hear you've been able to sort it out.
>
> -Flavio
>
> > On 12 Apr 2016, at 03:02, s influxdb <elastic....@gmail.com> wrote:
> >
> > created a parallel independant zookeeper cluster on the same set of
> > machines with different ports and that worked. This indicates the port
> was
> > the issue.
> >
> > On Mon, Apr 11, 2016 at 1:35 PM, s influxdb <elastic....@gmail.com>
> wrote:
> >
> >> reboot of the server didn't help
> >>
> >> On Thu, Apr 7, 2016 at 6:50 PM, s influxdb <elastic....@gmail.com>
> wrote:
> >>
> >>> I ran tcpdump on all the three nodes.
> >>> It looks like that for every  [PSH, ACK] there is a missing [ACK] from
> >>> the other nodes to this 2nd node on port 3888.
> >>>
> >>>
> >>> On Thu, Apr 7, 2016 at 1:29 PM, s influxdb <elastic....@gmail.com>
> wrote:
> >>>
> >>>> Thanks Flavio for your quick replies.
> >>>> The zookeeper version is 3.4.6
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Apr 7, 2016 at 1:23 PM, Flavio P JUNQUEIRA <f...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> You need to determine why it is not receiving notification messages.
> >>>>> From
> >>>>> the information you've given, it doesn't look like a zookeeper code
> >>>>> issue.
> >>>>>
> >>>>> BTW, which version are you using?
> >>>>>
> >>>>> -Flavio
> >>>>> On 7 Apr 2016 21:20, "s influxdb" <elastic....@gmail.com> wrote:
> >>>>>
> >>>>>> nothin on the iptables firewall .
> >>>>>>
> >>>>>> What options do i have to reconnect this node to the cluster ?
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Apr 7, 2016 at 10:14 AM, s influxdb <elastic....@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> telnet works on 2888 and 3888 to the other nodes. Now i see
> >>>>>>> java.net.SocketTimeoutException: connect timed out messages in the
> >>>>> logs
> >>>>>> for
> >>>>>>> node 2
> >>>>>>>
> >>>>>>> On Thu, Apr 7, 2016 at 3:05 AM, Flavio Junqueira <f...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> I only see notifications from the node to itself. It says that it
> >>>>> is
> >>>>>>>> connected to 1, but it doesn't seem to be receiving the
> >>>>> notification
> >>>>>> from
> >>>>>>>> 1. It also doesn't seem to be receiving the connection request
> >>>>> from 3.
> >>>>>>>>
> >>>>>>>> Last time I've seen something like this was due to iptables rules,
> >>>>> but
> >>>>>> if
> >>>>>>>> it was working before and no configuration has changed, then I
> >>>>> don't
> >>>>>> know
> >>>>>>>> what it could be.
> >>>>>>>>
> >>>>>>>> -Flavio
> >>>>>>>>
> >>>>>>>>> On 07 Apr 2016, at 05:43, s influxdb <elastic....@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> this is the pastie
> >>>>>>>>> http://pastie.org/10788301
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 6, 2016 at 9:41 PM, s influxdb <
> >>>>> elastic....@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> We had one of the node giving OOM java.lang.OutOfMemoryError:
> >>>>> unable
> >>>>>> to
> >>>>>>>>>> create new native thread and then being unresponsive.
> >>>>>>>>>>
> >>>>>>>>>> We tried to add the node back to the cluster but with no luck.
> >>>>>>>>>>
> >>>>>>>>>> It doesn't seem to "Receive any notification "  messages from
> >>>>> the
> >>>>>> other
> >>>>>>>>>> nodes.
> >>>>>>>>>> Keeps "Sending notifications " in loop
> >>>>>>>>>>
> >>>>>>>>>> Please see attached the logs of the node that is out of
> >>>>> rotation.
> >>>>>>>>>>
> >>>>>>>>>> Any inputs appreciated.
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Reply via email to