Re: [Discuss] num_tokens default in Cassandra 4.0

Jeremiah D Jordan Tue, 04 Feb 2020 10:07:29 -0800

JustFYI if being able to operationally do things many nodes at a time, you 
should look at setting up racks.  With num racks = RF you can take down all 
nodes in a given rack at once without affecting LOCAL_QUORUM.  Your single 
token example has the same functionality in this respect as a vnodes cluster 
using racks (and actually if you setup a single token cluster using racks you 
would have setup nodes N1 and N4 to be in the same rack).


> On Feb 3, 2020, at 11:07 PM, Max C. <mc_cassan...@core43.com> wrote:
> 
> Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
> each piece of data is stored as follows:
> 
> <primary>: <replicas>
> N1: N2 N3
> N2: N3 N4
> N3: N4 N5
> N4: N5 N6
> N5: N6 N1
> N6: N1 N2
> 
> With this setup, there are some circumstances where you could lose 2 nodes 
> (ex: N1 & N4) and still be able to maintain CL=quorum.  If your cluster is 
> very large, then you could lose even more — and that’s a good thing, because 
> if you have hundreds/thousands of nodes then you don’t want the world to come 
> tumbling down if  > 1 node is down.  Or maybe you want to upgrade the OS on 
> your nodes, and want to (with very careful planning!) do it by taking down 
> more than 1 node at a time.
> 
> … but if you have a large number of vnodes, then a given node will share a 
> small segment of data with LOTS of other nodes, which destroys this property. 
>  The more vnodes, the less likely you’re able to handle > 1 node down.
> 
> For example, see this diagram in the Datastax docs —
> 
> https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
>  
> <https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes>
> 
> In that bottom picture, you can’t knock out 2 nodes and still maintain 
> CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no 
> longer meet CL=quorum;  but you can do that in the top diagram, since there 
> are no ranges shared between node 1 & 4.
> 
> Hope that helps.
> 
> - Max
> 
> 
>> On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
>> <onmstes...@zoho.com.INVALID <mailto:onmstes...@zoho.com.INVALID>> wrote:
>> 
>> Sorry if its trivial, but i do not understand how num_tokens affects 
>> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost 
>> at most one node and all of the tokens assigned to that node would be also 
>> assigned to two other nodes no matter what num_tokens is, right?
>> 
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>> 
>> 
>> ============ Forwarded message ============
>> From: Jon Haddad <j...@jonhaddad.com <mailto:j...@jonhaddad.com>>
>> To: <d...@cassandra.apache.org <mailto:d...@cassandra.apache.org>>
>> Date: Tue, 04 Feb 2020 01:15:21 +0330
>> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>> ============ Forwarded message ============
>> 
>> I think it's a good idea to take a step back and get a high level view of 
>> the problem we're trying to solve. 
>> 
>> First, high token counts result in decreased availability as each node has 
>> data overlap with with more nodes in the cluster. Specifically, a node can 
>> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
>> going to almost always share data with every other node in the cluster that 
>> isn't in the same rack, unless you're doing something wild like using more 
>> than a thousand nodes in a cluster. We advertise 
>> 
>> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
>> each node needs to query against, so you're again, hitting every node 
>> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
>> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
>> for 4 tokens because you'd have overlap with only 16 nodes, which works 
>> well for small clusters as well as large. Assuming I was creating a new 
>> cluster for myself (in a hypothetical brand new application I'm building) I 
>> would put this in production. I have worked with several teams where I 
>> helped them put 4 token clusters in prod and it has worked very well. We 
>> didn't see any wild imbalance issues. 
>> 
>> As Mick's pointed out, our current method of using random token assignment 
>> for the default number of problematic for 4 tokens. I fully agree with 
>> this, and I think if we were to try to use 4 tokens, we'd want to address 
>> this in tandem. We can discuss how to better allocate tokens by default 
>> (something more predictable than random), but I'd like to avoid the 
>> specifics of that for the sake of this email. 
>> 
>> To Alex's point, repairs are problematic with lower token counts due to 
>> over streaming. I think this is a pretty serious issue and I we'd have to 
>> address it before going all the way down to 4. This, in my opinion, is a 
>> more complex problem to solve and I think trying to fix it here could make 
>> shipping 4.0 take even longer, something none of us want. 
>> 
>> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
>> with moving to 16 tokens, and in the process adding extensive documentation 
>> outlining what we recommend for production use. I think we should also try 
>> to figure out something better than random as the default to fix the data 
>> imbalance issues. I've got a few ideas here I've been noodling on. 
>> 
>> As long as folks are fine with potentially changing the default again in C* 
>> 5.0 (after another discussion / debate), 16 is enough of an improvement 
>> that I'm OK with the change, and willing to author the docs to help people 
>> set up their first cluster. For folks that go into production with the 
>> defaults, we're at least not setting them up for total failure once their 
>> clusters get large like we are now. 
>> 
>> In future versions, we'll probably want to address the issue of data 
>> imbalance by building something in that shifts individual tokens around. I 
>> don't think we should try to do this in 4.0 either. 
>> 
>> Jon 
>> 
>> 
>> 
>> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna <jeremy.hanna1...@gmail.com 
>> <mailto:jeremy.hanna1...@gmail.com>> 
>> wrote: 
>> 
>> > I think Mick and Anthony make some valid operational and skew points for 
>> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
>> > between small and large clusters but I think most would agree that most 
>> > clusters are on the small to medium side. (A small nuance is afaict the 
>> > probabilities have to do with quorum on a full token range, ie it has to 
>> > do 
>> > with the size of a datacenter not the full cluster 
>> > 
>> > As I read this discussion I’m personally more inclined to go with 16 for 
>> > now. It’s true that if we could fix the skew and topology gotchas for 
>> > those 
>> > starting things up, 4 would be ideal from an availability perspective. 
>> > However we’re still in the brainstorming stage for how to address those 
>> > challenges. I think we should create tickets for those issues and go with 
>> > 16 for 4.0. 
>> > 
>> > This is about an out of the box experience. It balances availability, 
>> > operations (such as skew and general bootstrap friendliness and 
>> > streaming/repair), and cluster sizing. Balancing all of those, I think for 
>> > now I’m more comfortable with 16 as the default with docs on 
>> > considerations 
>> > and tickets to unblock 4 as the default for all users. 
>> > 
>> > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa <jji...@gmail.com 
>> > >>> <mailto:jji...@gmail.com>> wrote: 
>> > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch <joe.e.ly...@gmail.com 
>> > >> <mailto:joe.e.ly...@gmail.com>> 
>> > wrote: 
>> > >> I think that we might be bikeshedding this number a bit because it is 
>> > easy 
>> > >> to debate and there is not yet one right answer. 
>> > > 
>> > > 
>> > > https://www.youtube.com/watch?v=v465T5u9UKo 
>> > > <https://www.youtube.com/watch?v=v465T5u9UKo> 
>> > 
>> > --------------------------------------------------------------------- 
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>> > <mailto:dev-unsubscr...@cassandra.apache.org> 
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org 
>> > <mailto:dev-h...@cassandra.apache.org> 
>> > 
>> > 
>> 
>> 
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Reply via email to