Re: cassandra-shuffle time to completion and required disk space

John Watson Sun, 28 Apr 2013 15:23:47 -0700

11 nodes
1 keyspace
256 vnodes per node
upgraded 1.1.9 to 1.2.3 a week ago

These are taken just before starting shuffle (ran repair/cleanup the day
before).
During shuffle disabled all reads/writes to the cluster.


nodetool status keyspace:

Load       Tokens  Owns (effective)  Host ID
80.95 GB   256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
87.15 GB   256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
98.16 GB   256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
142.6 GB   253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
77.64 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
194.31 GB  256     25.0%             6d726cbf-147d-426e-a735-e14928c95e45
221.94 GB  256     33.3%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
87.61 GB   256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
101.02 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
172.44 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
108.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761

nodetool status:

Load       Tokens  Owns   Host ID
142.6 GB   253     97.5%  339c474f-cf19-4ada-9a47-8b10912d5eb3
172.44 GB  256     0.1%   78192d73-be0b-4d49-a129-9bec0770efed
221.94 GB  256     0.4%   83ca527c-60c5-4ea0-89a8-de53b92b99c8
194.31 GB  256     0.1%   6d726cbf-147d-426e-a735-e14928c95e45
77.64 GB   256     0.3%   e59a02b3-8b91-4abd-990e-b3cb2a494950
87.15 GB   256     0.4%   93f4400a-09d9-4ca0-b6a6-9bcca2427450
98.16 GB   256     0.1%   ff821e8e-b2ca-48a9-ac3f-8234b16329ce
87.61 GB   256     0.3%   c3ea4026-551b-4a14-a346-480e8c1fe283
80.95 GB   256     0.4%   754f9f4c-4ba7-4495-97e7-1f5b6755cb27
108.5 GB   256     0.1%   9889280a-1433-439e-bb84-6b7e7f44d761
101.02 GB  256     0.3%   df7ba879-74ad-400b-b371-91b45dcbed37

Here's image of the actual disk usage during shuffle:

https://dl.dropbox.com/s/bx57j1z5c2spqo0/shuffle%20disk%20space.png

Little after 00:00 I disabled/cleared the xfers and restarted the cluster
(those drops around 00:15 are the restarts) before starting running
cleanup. The disks are only 540G and whenever cassandra runs out of disk
space, bad things seem to happen. Was just barely able to run cleanup
without running out space after the failed shuffle.

After the restart:

Load       Tokens  Owns (effective)  Host ID
131.73 GB  256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
418.88 GB  255     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
171.19 GB  255     8.5%              ff821e8e-b2ca-48a9-ac3f-8234b16329ce
142.61 GB  253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
178.83 GB  257     24.9%             e59a02b3-8b91-4abd-990e-b3cb2a494950
442.32 GB  257     25.0%             6d726cbf-147d-426e-a735-e14928c95e45
185.28 GB  257     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
274.47 GB  255     33.3%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
210.73 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
274.49 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
106.47 GB  256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761

It's currently still running cleanup, so taking the output from status will
be a little inaccurate.

I have everything instrumented by Metrics being pushed into Graphite. So if
there's graphs/data that may help from there please let me know.

Thanks,

John


On Sun, Apr 28, 2013 at 2:52 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Can you provide some info on the number of nodes, node load, cluster load
> etc ?
>
> AFAIK shuffle was not an easy thing to test and does not get much real
> world use as only some people will run it and they (normally) use it once.
>
> Any info you can provide may help improve the process.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/04/2013, at 9:21 AM, John Watson <j...@disqus.com> wrote:
>
> The amount of time/space cassandra-shuffle requires when upgrading to
> using vnodes should really be apparent in documentation (when some is made).
>
> Only semi-noticeable remark about the exorbitant amount of time is a
> bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance
>
> "Shuffling will entail moving a lot of data around the cluster and so has
> the potential to consume a lot of disk and network I/O, and to take a
> considerable amount of time. For this to be an online operation, the
> shuffle will need to operate on a lower priority basis to other streaming
> operations, and should be expected to take days or weeks to complete."
>
> We tried running shuffle on a QA version of our cluster and 2 things were
> brought to light:
>  - Even with no reads/writes it was going to take 20 days
>  - Each machine needed enough free diskspace to potentially hold the
> entire cluster's sstables on disk
>
> Regards,
>
> John
>
>
>

Re: cassandra-shuffle time to completion and required disk space

Reply via email to