[ Rustam Aliyev ]
> Hi,
> 
> After upgrading to the vnodes I created and enabled shuffle
> operation as suggested. After running for a couple of hours I had to
> disable it because nodes were not catching up with compactions. I
> repeated this process 3 times (enable/disable).
> 
> I have 5 nodes and each of them had ~35GB. After shuffle operations
> described above some nodes are now reaching ~170GB. In the log files
> I can see same files transferred 2-4 times to the same host within
> the same shuffle session. Worst of all, after all of these I had
> only 20 vnodes transferred out of 1280. So if it will continue at
> the same speed it will take about a month or two to complete
> shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.

> I had few question to better understand shuffle:
> 
> 1. Does disabling and re-enabling shuffle starts shuffle process from
>    scratch or it resumes from the last point?

It resumes.

> 2. Will vnode reallocations speedup as shuffle proceeds or it will
>    remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.

> 3. Why I see multiple transfers of the same file to the same host? e.g.:
> 
>    INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
>    StreamReplyVerbHandler.java (line 44) Successfully sent
>    /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>    to /10.0.1.8
>    INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
>    StreamReplyVerbHandler.java (line 44) Successfully sent
>    /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>    to /10.0.1.8

I'm not sure, but perhaps that file contained data for two different
ranges?

> 4. When I enable/disable shuffle I receive warning message such as
>    below. Do I need to worry about it?
> 
>    cassandra-shuffle -h localhost disable
>    Failed to enable shuffling on 10.0.1.1!
>    Failed to enable shuffling on 10.0.1.3!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?

As a rule of thumb though, you don't want an disable/enable to result in
only a subset of nodes shuffling.  Are there no other errors?  What do
the logs say?

> I couldn't find many docs on shuffle, only read through JIRA and
> original proposal by Eric.

-- 
Eric Evans
eev...@sym-link.com

Reply via email to