[ Rustam Aliyev ] > Hi, > > After upgrading to the vnodes I created and enabled shuffle > operation as suggested. After running for a couple of hours I had to > disable it because nodes were not catching up with compactions. I > repeated this process 3 times (enable/disable). > > I have 5 nodes and each of them had ~35GB. After shuffle operations > described above some nodes are now reaching ~170GB. In the log files > I can see same files transferred 2-4 times to the same host within > the same shuffle session. Worst of all, after all of these I had > only 20 vnodes transferred out of 1280. So if it will continue at > the same speed it will take about a month or two to complete > shuffle.
As Edward says, you'll need to issue a cleanup post-shuffle if you expect to see disk usage match your expectations. > I had few question to better understand shuffle: > > 1. Does disabling and re-enabling shuffle starts shuffle process from > scratch or it resumes from the last point? It resumes. > 2. Will vnode reallocations speedup as shuffle proceeds or it will > remain the same? The shuffle proceeds synchronously, 1 range at a time; It's not going to speed up as it progresses. > 3. Why I see multiple transfers of the same file to the same host? e.g.: > > INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038 > StreamReplyVerbHandler.java (line 44) Successfully sent > /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db > to /10.0.1.8 > INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427 > StreamReplyVerbHandler.java (line 44) Successfully sent > /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db > to /10.0.1.8 I'm not sure, but perhaps that file contained data for two different ranges? > 4. When I enable/disable shuffle I receive warning message such as > below. Do I need to worry about it? > > cassandra-shuffle -h localhost disable > Failed to enable shuffling on 10.0.1.1! > Failed to enable shuffling on 10.0.1.3! Is that the verbatim output? Did it report failing to enable when you tried to disable? As a rule of thumb though, you don't want an disable/enable to result in only a subset of nodes shuffling. Are there no other errors? What do the logs say? > I couldn't find many docs on shuffle, only read through JIRA and > original proposal by Eric. -- Eric Evans eev...@sym-link.com