I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

Sean Durity
From: Léo FERLIN SUTTON <lfer...@mailjet.com.INVALID>
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

Hello !

Thank you for your answers.

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

I have joined a jstack dump and some logs.

Our node was shut down at around 97% disk space used.
I turned it back on and it starting the bootstrap process again.

The log file is the log from this attempt, same for the thread dump.

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

Regards,

Leo

On Thu, Feb 7, 2019 at 8:13 AM 
dinesh.jo...@yahoo.com.INVALID<mailto:dinesh.jo...@yahoo.com.INVALID> 
<dinesh.jo...@yahoo.com.invalid<mailto:dinesh.jo...@yahoo.com.invalid>> wrote:
Would it be possible for you to take a thread dump & logs and share them?

Dinesh


On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
<lfer...@mailjet.com.INVALID<mailto:lfer...@mailjet.com.INVALID>> wrote:


Hello !

I am having a recurrent problem when trying to bootstrap a few new nodes.

Some general info :

  *   I am running cassandra 3.0.17
  *   We have about 30 nodes in our cluster
  *   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra
So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

`nodetool status` says the node is still joining,

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

  1.  The node fills up to 100% disk space and crashes.
  2.  The bootstrap resume finishes with errors
When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

This is the output I get from `nodetool resume` :
[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)
[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)
[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)
[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)
[2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress: 2114%)
[2019-02-06 01:42:14,980] Stream failed
[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
[2019-02-06 01:42:14,982] Resume bootstrap complete

The bootstrap `progress` goes way over 100% and eventually fails.


Right now I have a node with this output from `nodetool status` :
`UJ  10.16.XX.YYY  2.93 TB    256          ?                 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`

It is almost filled with data, yet if I look at `nodetool netstats` :
        Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
        Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
        Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
        Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
        Receiving 424 files, 281.25 GB total. Already received 1 files, 1.3 GB 
total
        Receiving 581 files, 349.26 GB total. Already received 8 files, 45.96 
MB total
        Receiving 443 files, 337.26 GB total. Already received 6 files, 96.15 
MB total
        Receiving 424 files, 275.23 GB total. Already received 5 files, 42.67 
MB total

It is trying to pull all the data again.

Am I missing something about the way `nodetool bootstrap resume` is supposed to 
be used ?

Regards,

Leo


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to