Re: cassandra-shuffle time to completion and required disk space

Sam Overton Mon, 29 Apr 2013 04:09:49 -0700

An alternative to running shuffle is to do a rolling
bootstrap/decommission. You would set num_tokens on the existing hosts (and
restart them) so that they split their ranges, then bootstrap in N new
hosts, then decommission the old ones.




On 28 April 2013 22:21, John Watson <j...@disqus.com> wrote:

> The amount of time/space cassandra-shuffle requires when upgrading to
> using vnodes should really be apparent in documentation (when some is made).
>
> Only semi-noticeable remark about the exorbitant amount of time is a
> bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance
>
> "Shuffling will entail moving a lot of data around the cluster and so has
> the potential to consume a lot of disk and network I/O, and to take a
> considerable amount of time. For this to be an online operation, the
> shuffle will need to operate on a lower priority basis to other streaming
> operations, and should be expected to take days or weeks to complete."
>
> We tried running shuffle on a QA version of our cluster and 2 things were
> brought to light:
>  - Even with no reads/writes it was going to take 20 days
>  - Each machine needed enough free diskspace to potentially hold the
> entire cluster's sstables on disk
>
> Regards,
>
> John
>



-- 
Sam Overton
Acunu | http://www.acunu.com | @acunu

Re: cassandra-shuffle time to completion and required disk space

Reply via email to