11 nodes 1 keyspace 256 vnodes per node upgraded 1.1.9 to 1.2.3 a week ago These are taken just before starting shuffle (ran repair/cleanup the day before). During shuffle disabled all reads/writes to the cluster.
nodetool status keyspace: Load Tokens Owns (effective) Host ID 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 nodetool status: Load Tokens Owns Host ID 142.6 GB 253 97.5% 339c474f-cf19-4ada-9a47-8b10912d5eb3 172.44 GB 256 0.1% 78192d73-be0b-4d49-a129-9bec0770efed 221.94 GB 256 0.4% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 194.31 GB 256 0.1% 6d726cbf-147d-426e-a735-e14928c95e45 77.64 GB 256 0.3% e59a02b3-8b91-4abd-990e-b3cb2a494950 87.15 GB 256 0.4% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 98.16 GB 256 0.1% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 87.61 GB 256 0.3% c3ea4026-551b-4a14-a346-480e8c1fe283 80.95 GB 256 0.4% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 108.5 GB 256 0.1% 9889280a-1433-439e-bb84-6b7e7f44d761 101.02 GB 256 0.3% df7ba879-74ad-400b-b371-91b45dcbed37 Here's image of the actual disk usage during shuffle: https://dl.dropbox.com/s/bx57j1z5c2spqo0/shuffle%20disk%20space.png Little after 00:00 I disabled/cleared the xfers and restarted the cluster (those drops around 00:15 are the restarts) before starting running cleanup. The disks are only 540G and whenever cassandra runs out of disk space, bad things seem to happen. Was just barely able to run cleanup without running out space after the failed shuffle. After the restart: Load Tokens Owns (effective) Host ID 131.73 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 418.88 GB 255 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 171.19 GB 255 8.5% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 142.61 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 178.83 GB 257 24.9% e59a02b3-8b91-4abd-990e-b3cb2a494950 442.32 GB 257 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 185.28 GB 257 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 274.47 GB 255 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 210.73 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 274.49 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 106.47 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 It's currently still running cleanup, so taking the output from status will be a little inaccurate. I have everything instrumented by Metrics being pushed into Graphite. So if there's graphs/data that may help from there please let me know. Thanks, John On Sun, Apr 28, 2013 at 2:52 PM, aaron morton <aa...@thelastpickle.com>wrote: > Can you provide some info on the number of nodes, node load, cluster load > etc ? > > AFAIK shuffle was not an easy thing to test and does not get much real > world use as only some people will run it and they (normally) use it once. > > Any info you can provide may help improve the process. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 29/04/2013, at 9:21 AM, John Watson <j...@disqus.com> wrote: > > The amount of time/space cassandra-shuffle requires when upgrading to > using vnodes should really be apparent in documentation (when some is made). > > Only semi-noticeable remark about the exorbitant amount of time is a > bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance > > "Shuffling will entail moving a lot of data around the cluster and so has > the potential to consume a lot of disk and network I/O, and to take a > considerable amount of time. For this to be an online operation, the > shuffle will need to operate on a lower priority basis to other streaming > operations, and should be expected to take days or weeks to complete." > > We tried running shuffle on a QA version of our cluster and 2 things were > brought to light: > - Even with no reads/writes it was going to take 20 days > - Each machine needed enough free diskspace to potentially hold the > entire cluster's sstables on disk > > Regards, > > John > > >