Re: Nodes not added to existing cluster

Skye Book Sun, 17 Nov 2013 23:36:52 -0800

Hi there,

I’m bringing this thread back as its something that I thought was solved and is 
apparently not fixed on my end.


To recap, I’m having trouble getting a node to join a cluster.  Configuration 
seems all right using the EC2MultiRegionSnitch but new nodes are unable to 
handshake with seeds.

- Security Group has 22 && 1024-65535 open
- Nodes are configured with password authentication using CassandraAuthorizer
- internode_authenticator is commented out in configuration
- rpc_address is set to the instance’s private address
- listen_address is set to the instance’s private address
- broadcast_address is set to the instance's public address

As was suggested earlier, I’ve enabled TRACE logging for OutboundTcpConnection 
and get the following dumped into system.log when the new node is started up 
without itself in the seed list (if its own IP is in the list it just creates a 
new single node cluster).  I’ve gisted the results here: 
https://gist.github.com/skyebook/be5ee75a000a1e6d65d0

It looks like the handshake process completely and utterly fails as it seems 
unable to get any information from the other nodes as evidenced by:
OutboundTcpConnection.java (line 386) Handshaking version with /NODE_1_PUBLIC_IP
OutboundTcpConnection.java (line 386) Handshaking version with /NODE_2_PUBLIC_IP
OutboundTcpConnection.java (line 333) Target max version is -2147483648; no 
version information yet, will retry

Thanks in advance for any light you all might be able to shed on what’s going 
on.

On Sep 26, 2013, at 9:03 PM, Aaron Morton <aa...@thelastpickle.com> wrote:

>>  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
>>  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
> If you can turn up logging to TRACE for 
> org.apache.cassandra.net.OutboundTcpConnection it will include the full 
> error. 
> 
>> The two addresses that it is unable to handshake with are the other two 
>> addresses of nodes in the cluster I'm unable to join.
> Are you mixing versions ? 
> 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 26/09/2013, at 5:13 PM, Skye Book <skye.b...@gmail.com> wrote:
> 
>> Hi Aaron, thanks for the clarification.
>> 
>> As might be expected, having the broadcast_address fixed hasn't fixed 
>> anything.  What I did find after writing my last email is that output.log is 
>> littered with these:
>> 
>>  INFO 05:03:49,015 Cannot handshake version with /aa.bb.cc.dd
>>  INFO 05:03:49,017 Handshaking version with /aa.bb.cc.dd
>>  INFO 05:03:49,803 Cannot handshake version with /ww.xx.yy.zz
>>  INFO 05:03:49,805 Handshaking version with /ww.xx.yy.zz
>> 
>> The two addresses that it is unable to handshake with are the other two 
>> addresses of nodes in the cluster I'm unable to join.  I started thinking 
>> that maybe EC2 was having an-advertised problem communicating between AZ's 
>> but bringing up nodes in both of the other availability zones resulted in 
>> the same wrong behavior.
>> 
>> I've gist'd my cassandra.yaml, its pretty standard and hasn't caused an 
>> issue in the past for me.  
>> https://gist.github.com/skyebook/ec9364cdcec02e803ffc
>> 
>> Skye Book
>> http://skyebook.net -- @sbook
>> 
>> On Sep 26, 2013, at 12:34 AM, Aaron Morton <aa...@thelastpickle.com> wrote:
>> 
>>>>  I am curious, though, how any of this worked in the first place spread 
>>>> across three AZ's without that being set?
>>> boradcast_address is only needed when you are going cross region (IIRC it's 
>>> the EC2MultiRegionSnitch) that sets it. 
>>> 
>>> As rob said, make sure the seed list includes on of the other nodes and 
>>> that the cluster_name set. 
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> New Zealand
>>> @aaronmorton
>>> 
>>> Co-Founder & Principal Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>> 
>>> On 26/09/2013, at 8:12 AM, Skye Book <skye.b...@gmail.com> wrote:
>>> 
>>>> Thank you, both Michael and Robert for your suggestions.  I actually saw 
>>>> 5760, but we were running on 2.0.0, which it seems like this was fixed in.
>>>> 
>>>> That said, I noticed that my Chef scripts were failing to set the 
>>>> broadcast_address correctly, which I'm guessing is the cause of the 
>>>> problem, fixing that and trying a redeploy.  I am curious, though, how any 
>>>> of this worked in the first place spread across three AZ's without that 
>>>> being set?
>>>> 
>>>> -Skye
>>>> 
>>>> On Sep 25, 2013, at 3:56 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>>> 
>>>>> On Wed, Sep 25, 2013 at 12:41 PM, Skye Book <skye.b...@gmail.com> wrote:
>>>>> I have a three node cluster using the EC2 Multi-Region Snitch currently 
>>>>> operating only in US-EAST.  On having a node go down this morning, I 
>>>>> started a new node with an identical configuration, except for the seed 
>>>>> list, the listen address and the rpc address.  The new node comes up and 
>>>>> creates its own cluster rather than joining the pre-existing ring.  I've 
>>>>> tried creating a node both before ad after using `nodetool remove` for 
>>>>> the bad node, each time with the same result.
>>>>> 
>>>>> What version of Cassandra?
>>>>> 
>>>>> This particular confusing behavior is fixed upstream, in a version you 
>>>>> should not deploy to production yet. Take some solace, however, that you 
>>>>> may be the last Cassandra administrator to die for a broken code path!
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-5768
>>>>> 
>>>>> Does anyone have any suggestions for where to look that might put me on 
>>>>> the right track?
>>>>> 
>>>>> It must be that your seed list is wrong in some way, or your node state 
>>>>> is wrong. If you're trying to bootstrap a node, note that you can't 
>>>>> bootstrap a node when it is in its own seed list.
>>>>> 
>>>>> If you have installed Cassandra via debian package, there is a 
>>>>> possibility that your node has started before you explicitly started it. 
>>>>> If so, it might have invalid node state.
>>>>> 
>>>>> Have you tried wiping the data directory and trying again?
>>>>> 
>>>>> What is your seed list? Are you sure the new node can reach the seeds on 
>>>>> the network layer?
>>>>> 
>>>>> =Rob
>>>> 
>>> 
>> 
>

Re: Nodes not added to existing cluster

Reply via email to