Re: Gossip status: hibernate

Joel Knighton Thu, 13 Oct 2016 07:37:34 -0700

In the normal case, absolutely. That should happen quickly. I think it is
likely the case that you've hit another race condition where this phantom
node is not correctly marked as dead. In this case, even if it is removed
from some nodes in the cluster, it will get re-added by the node that
doesn't have the phantom node marked down. I can't say definitively without
some more information though.


On Thu, Oct 13, 2016 at 7:11 AM, Kasper Petersen <kas...@sybogames.com>
wrote:

> Thanks for the details.
>
> I don't know what happened on that node. It's a long time ago I think. I
> wasn't aware of it earlier.
>
> Will a node in hibernating state that failed joining and subsequently was
> discarded get removed from gossip at some point?
>
> On Wed, Oct 12, 2016 at 5:23 PM, Joel Knighton <joel.knigh...@datastax.com
> > wrote:
>
>> 1. A hibernating node is participating in gossip but intentionally hasn't
>> yet joined the ring. The two cases where a node would set a hibernating
>> status are when the node was started with "-Dcassandra.join_ring=False" and
>> has tokens or when the node was started to replace another node (using
>> "-Dcassandra.replace_address" or "-Dcassandra.replace_address_f
>> irst_boot").
>>
>> 2. A rolling restart is probably your best bet. You may have more luck
>> with an assassinate in the case that you connect to a node that is not
>> continuously removing/adding the state. I suspect that this node will have
>> an alive status for this endpoint state. As usual, you should wield
>> assassinate with lots of caution.
>>
>> This issue sounds most similar to CASSANDRA-10371. If you provide
>> debugging information similar to that requested on the above ticket as well
>> as what operation you were performing on the node (was it a failed attempt
>> at replacing? etc) on a JIRA ticket, someone might have a chance to look
>> into this further.
>>
>> On Wed, Oct 12, 2016 at 9:48 AM, Kasper Petersen <kas...@sybogames.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've recently upgraded our Cassandra cluster from 2.1 to 3.9. By
>>> default(?) 3.9 creates a debug.log file containing a ton of lines (a new
>>> one every second) with:
>>>
>>> DEBUG [GossipTasks:1] 2016-10-12 14:43:38,761 Gossiper.java:337 -
>>>> Convicting /172.31.137.65 with status hibernate - alive false
>>>
>>>
>>> That node has not been around for a very long time now.
>>>
>>> It does not show up in nodetool status and nodetool gossipinfo returns
>>> the following output about that node:
>>>
>>> /172.31.137.65
>>>>   generation:1433571405
>>>>   heartbeat:232
>>>>   STATUS:3:hibernate,true
>>>>   LOAD:225:96445.0
>>>>   SCHEMA:53:e2d1a288-581c-3f35-b492-1b9d5a803631
>>>>   DC:9:us-east
>>>>   RACK:11:1b
>>>>   RELEASE_VERSION:7:2.1.5
>>>>   RPC_ADDRESS:6:172.31.137.65
>>>>   SEVERITY:231:0.2512562870979309
>>>>   NET_VERSION:4:8
>>>>   HOST_ID:5:7988d3c9-dec8-4b71-b5a9-0b962aad0680
>>>>   TOKENS:2:<hidden>
>>>
>>>
>>> nodetool removenode 7988d3c9-dec8-4b71-b5a9-0b962aad0680 resulted in:
>>>
>>> error: Host ID not found.
>>>>
>>>
>>> Now my questions are:
>>>
>>>    1. What does it mean for a node to be "hibernating"? How does it end
>>>    up in that state?
>>>    2. How do I get rid of it? Its not coming back.
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Kasper Middelboe Petersen
>>>
>>> *Lead Backend Developer*
>>>
>>> *SYBO Games ApS*
>>> Jorcks Passage 1A, 4th.
>>> 1162 Copenhagen K
>>>
>>
>>
>
>
> --
> Best regards,
> Kasper Middelboe Petersen
>
> *Lead Backend Developer*
>
> *SYBO Games ApS*
> Jorcks Passage 1A, 4th.
> 1162 Copenhagen K
>

Re: Gossip status: hibernate

Reply via email to