Re: Bootstrap keeps failing

Léo FERLIN SUTTON Tue, 12 Mar 2019 09:46:56 -0700

Hello !

Just wanted to let you know : We finally managed to get a solution !


First of all we increased `streaming_socket_timeout_in_ms` to `86400000`.

We are using cassandra-reaper to manage our repairs, they last about 15
days on this cluster and a re-launched almost immediately once they are
finished.
Before bootstrapping we paused the repair operations and launched a
bootstrap.

With these changes we were able to bootstrap a node without any errors. Not
sure if it is due to to the new `streaming_socket_timeout_in_ms` or the
pause of repairs but it now works !

Regards,

Leo

On Thu, Feb 14, 2019 at 7:41 PM Léo FERLIN SUTTON <lfer...@mailjet.com>
wrote:

> On Thu, Feb 14, 2019 at 6:56 PM Kenneth Brotman
> <kenbrot...@yahoo.com.invalid> wrote:
>
>> Those aren’t the same error messages so I think progress has been made.
>>
>>
>>
>> What version of C* are you running?
>>
> 3.0.17 We will upgrade to 3.0.18 soon.
>
>> How did you clear out the space?
>>
> I had a few topology changes to cleanup. `nodetool cleanup` did miracles.
>
> Regards,
>
> Leo
>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID]
>> *Sent:* Thursday, February 14, 2019 7:54 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Bootstrap keeps failing
>>
>>
>>
>> Hello again !
>>
>>
>>
>> I have managed to free a lot of disk space and now most nodes hover
>> between 50% and 80%.
>>
>> I am still getting bootstrapping failures :(
>>
>>
>>
>> Here I have some logs :
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company.internal user err
>> cassandra  [org.apache.cassandra.streaming.StreamSession] [onError] -
>> [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Streaming error occurred
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user info
>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>> [handleSessionComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d]
>> Session with /10.10.23.1
>>
>> 55 is complete
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>> [maybeComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Stream
>> failed
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.service.StorageService] [onFailure] -
>> Error during bootstrap.
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user err
>> cassandra  [org.apache.cassandra.service.StorageService] [bootstrap] -
>> Error while waiting on bootstrap to complete. Bootstrap will have to be
>> restarted.
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.service.StorageService] [joinTokenRing] -
>> Some data streaming failed. Use nodetool to check bootstrap state and
>> resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
>>
>>
>>
>>
>>
>> I can see a `Streaming error occured` for all of my nodes it is trying to
>> stream from. Is there a way to have more logs to know why the failures
>> occurred ?
>>
>> I have set `<logger name="org.apache.cassandra.streaming.StreamSession"
>> level="DEBUG" />` but it doesn't seem to give me more details, is there
>> another class I should set to DEBUG ?
>>
>>
>>
>> Finally I have also noticed a lot of :
>>
>> [org.apache.cassandra.db.compaction.LeveledManifest]
>> [getCompactionCandidates] - Bootstrapping - doing STCS in L0
>>
>> In my logs files, It might be important.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Fri, Feb 8, 2019 at 3:59 PM Léo FERLIN SUTTON <lfer...@mailjet.com>
>> wrote:
>>
>> On Fri, Feb 8, 2019 at 3:37 PM Kenneth Brotman
>> <kenbrot...@yahoo.com.invalid> wrote:
>>
>> Thanks for the details that helps us understand the situation.  I’m
>> pretty sure you’ve exceed the working capacity of some of those nodes.
>> Going over 50% - 75% depending on compaction strategy is ill-advised.
>>
>> 50% free disk space is a steep price to pay for disk space not used. We
>> have about 90 terabytes of data on SSD and we are paying about 100$ per
>> terabytes of ssd storage (on google cloud).
>>
>> Maybe we can get closer to 75%.
>>
>>
>>
>> Our compaction strategy is `LeveledCompactionStrategy` on our two biggest
>> tables (90% of the data).
>>
>>
>>
>> You need to clear out as much room as possible to add more nodes.
>>
>> Are the tombstones clearing out.
>>
>> I think we don't have a lot of tombstones :
>>
>> We have 0 deletes on our two biggest tables.
>>
>> One of them get updated with new data (messages.messages), but the
>> updates are filling columns previously empty, I am unsure but I think this
>> doesn't cause any tombstones.
>>
>> I have joined the info from `nodetool tablestats` for our two largest
>> tables.
>>
>>
>>
>> We are using cassandra-reaper that manages our repairs. A full repair
>> takes about 13 days. So if we have tombstones they should not be older than
>> 13 days.
>>
>>
>>
>> Are there old snap shots that you can delete.  And so on.
>>
>> Unfortunately no. We take a daily snapshot that we backup then drop.
>>
>>
>>
>> You have to make more room on the existing nodes.
>>
>>
>>
>> I am trying to run `nodetool cleanup` on our most "critical" nodes to see
>> if it helps. If that doesn't do the trick we will only have two solutions :
>>
>>    - Add more disk space on each node
>>    - Adding new nodes
>>
>> We have looked at some other companies case studies and it looks like we
>> have a few very big nodes instead of a lot of smaller ones.
>>
>> We are currently trying to add nodes, and are hoping to eventually
>> transition to a "lot of small nodes" model and be able to add nodes a lot
>> faster.
>>
>>
>>
>> Thank you again for your interest,
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID]
>> *Sent:* Friday, February 08, 2019 6:16 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Bootstrap keeps failing
>>
>>
>>
>> On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman
>> <kenbrot...@yahoo.com.invalid> wrote:
>>
>> Lots of things come to mind. We need more information from you to help us
>> understand:
>>
>> How long have you had your cluster running?
>>
>> A bit more than a year old. But it has been constantly growing (3 nodes
>> to 6 nodes to 12 nodes, etc).
>>
>> We have a replication_factor of 3 on all keyspaces and 3 racks with an
>> equal amount of nodes.
>>
>>
>>
>> Is it generally working ok?
>>
>> Works fine. Good performance, repairs managed by cassandra-reaper.
>>
>>
>>
>> Is it just one node that is misbehaving at a time?
>>
>> We only bootstrap nodes one at a time. Sometimes it works flawlessly,
>> sometimes it fails. When it fails it tends to fail a lot in a row before we
>> manage to get it bootstrapped.
>>
>>
>>
>> How many nodes do you need to replace?
>>
>> I am adding nodes, not replacing any. Our nodes are starting to get very
>> full and we wish to add at least 6 more nodes (short-term).
>>
>> Adding a new node is quite slow (48 to 72 hours) and that's when the
>> boostrap process works at the first try.
>>
>>
>>
>> Are you doing rolling restarts instead of simultaneously?
>>
>> Yes.
>>
>>
>>
>> Do you have enough capacity on your machines?  Did you say some of the
>> nodes are at 90% capacity?
>>
>> The free disk space left fluctuates but is generally between 80% and 90%,
>> this is why we are planning to add a lot more nodes.
>>
>>
>>
>> When did this problem begin?
>>
>>  Not sure about this one. Probably since our nodes have more than 2to
>> data, I don't remember it being an issue when our nodes were smaller.
>>
>>
>>
>> Could something be causing a racing condition?
>>
>> We have schema changes every day.
>>
>> We have temporary data stored in cassandra, only used for 6 days then
>> destroyed.
>>
>>
>>
>> In order to avoid tombstones we have a table rotation, every day we
>> create a new table to contain the data for the next day, and we drop the
>> oldest temporary table.
>>
>>
>>
>> This means that when the node starts to bootstrap it will ask other nodes
>> for data that will almost certainly be dropped before the boostrap process
>> is finished.
>>
>>
>>
>> Did you recheck the commands you used to make sure they are correct?
>>
>> What procedure do you use?
>>
>>
>>
>> Our procedure is :
>>
>>    1. We install cassandra on a brand new instance (debian).
>>    2. We install cassandra.
>>    3. We stop the default cassandra (launched by the debian package).
>>    4. We empty these directories :
>>    /var/lib/cassandra/commitlog
>>    /var/lib/cassandra/data
>>    /var/lib/cassandra/saved_caches
>>    5. We put our configuration in place of the default one.
>>    6. We start the cassandra.
>>
>> If after 3 days we see that the node hasn't joined the cluster we check
>> the `nodetool netstats` command to see if the node is still streaming data.
>> If it is not we launch `nodetool bootstrap resume` on the instance.
>>
>>
>>
>> Thank you for you interest in our issue !
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID]
>> *Sent:* Thursday, February 07, 2019 9:16 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [EXTERNAL] Re: Bootstrap keeps failing
>>
>>
>>
>> Thank you for the recommendation.
>>
>>
>>
>> We are already using datastax's recommended settings for tcp_keepalive
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R <
>> sean_r_dur...@homedepot.com> wrote:
>>
>> I have seen unreliable streaming (streaming that doesn’t finish) because
>> of TCP timeouts from firewalls or switches. The default tcp_keepalive
>> kernel parameters are usually not tuned for that. See
>> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
>> for more details. These “remote” timeouts are difficult to detect or prove
>> if you don’t have access to the intermediate network equipment.
>>
>>
>>
>> Sean Durity
>>
>> *From:* Léo FERLIN SUTTON <lferlin@mailjetcomINVALID>
>> *Sent:* Thursday, February 07, 2019 10:26 AM
>> *To:* user@cassandra.apache.org; dinesh.jo...@yahoo.com
>> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing
>>
>>
>>
>> Hello !
>>
>> Thank you for your answers.
>>
>>
>>
>> So I have tried, multiple times, to start bootstrapping from scratch. I
>> often have the same problem (on other nodes as well) but sometimes it works
>> and I can move on to another node.
>>
>>
>>
>> I have joined a jstack dump and some logs.
>>
>>
>>
>> Our node was shut down at around 97% disk space used
>>
>> I turned it back on and it starting the bootstrap process again.
>>
>>
>>
>> The log file is the log from this attempt, same for the thread dump.
>>
>>
>>
>> Small warning, I have somewhat anonymised the log files so there may be
>> some inconsistencies
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID <
>> dinesh.jo...@yahoo.com.invalid <dinesh.joshi@yahoocom.invalid>> wrote:
>>
>> Would it be possible for you to take a thread dump & logs and share them?
>>
>>
>>
>> Dinesh
>>
>>
>>
>>
>>
>> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON <
>> lfer...@mailjet.com.INVALID> wrote:
>>
>>
>>
>>
>>
>> Hello !
>>
>>
>>
>> I am having a recurrent problem when trying to bootstrap a few new nodes.
>>
>>
>>
>> Some general info :
>>
>>    - I am running cassandra 3.0.17
>>    - We have about 30 nodes in our cluster
>>    - All healthy nodes have between 60% to 90% used disk space on
>>    /var/lib/cassandra
>>
>> So I create a new node and let auto_bootstrap do it's job. After a few
>> days the bootstrapping node stops streaming new data but is still not a
>> member of the cluster.
>>
>>
>>
>> `nodetool status` says the node is still joining,
>>
>>
>>
>> When this happens I run `nodetool bootstrap resume`. This usually ends up
>> in two different ways :
>>
>>    1. The node fills up to 100% disk space and crashes.
>>    2. The bootstrap resume finishes with errors
>>
>> When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
>> does not resume but restarts a full transfer of every data from every node.
>>
>>
>>
>> This is the output I get from `nodetool resume` :
>>
>> [2019-02-06 01:39:14,369] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:16,821] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:17,003] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
>> 2113%)
>>
>> [2019-02-06 01:41:15,160] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:02,864] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:09,284] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:10,522] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:10,622] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:11,925] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
>> (progress: 2114%)
>>
>> [2019-02-06 01:42:14,887] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
>> (progress: 2114%)
>>
>> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
>> 2114%)
>>
>> [2019-02-06 01:42:14,980] Stream failed
>>
>> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>>
>> [2019-02-06 01:42:14,982] Resume bootstrap complete
>>
>>
>>
>> The bootstrap `progress` goes way over 100% and eventually fails.
>>
>>
>>
>>
>>
>> Right now I have a node with this output from `nodetool status` :
>>
>> `UJ  10.16.XX.YYY  2.93 TB    256          ?
>>  5788f061-a3c0-46af-b712-ebeecd397bf7  c`
>>
>>
>>
>> It is almost filled with data, yet if I look at `nodetool netstats` :
>>
>>         Receiving 480 files, 325.39 GB total. Already received 5 files,
>> 68.32 MB total
>>         Receiving 499 files, 328.96 GB total. Already received 1 files,
>> 1.32 GB total
>>         Receiving 506 files, 345.33 GB total. Already received 6 files,
>> 24.19 MB total
>>         Receiving 362 files, 206.73 GB total. Already received 7 files,
>> 34 MB total
>>         Receiving 424 files, 281.25 GB total. Already received 1 files,
>> 1.3 GB total
>>         Receiving 581 files, 349.26 GB total. Already received 8 files,
>> 45.96 MB total
>>         Receiving 443 files, 337.26 GB total. Already received 6 files,
>> 96.15 MB total
>>         Receiving 424 files, 275.23 GB total. Already received 5 files,
>> 42.67 MB total
>>
>>
>>
>> It is trying to pull all the data again.
>>
>>
>>
>> Am I missing something about the way `nodetool bootstrap resume` is
>> supposed to be used ?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>>
>> ------------------------------
>>
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>>

Re: Bootstrap keeps failing

Reply via email to