On Thu, Jun 13, 2013 at 10:47 AM, Markus Klems <markuskl...@gmail.com> wrote:
> One scaling strategy seems interesting but we don't
> fully understand what is going on, yet. The strategy works like this:
> add new nodes to a Cassandra cluster with "auto_bootstrap = false" to
> avoid streaming to the new nodes.

If you set auto_bootstrap to false, new nodes take over responsibility
for a range of the ring but do not receive the data for the range from
the old nodes. If you read the new node at CL.ONE, you will get the
answer that data you wrote to the old node does not exist, because the
new node did not receive it as part of bootstrap. This is probably not
what you expect.

> We were a bit surprised that this
> strategy improved performance considerably and that it worked much
> better than other strategies that we tried before, both in terms of
> scaling speed and performance impact during scaling.

CL.ONE requests for rows which do not exist are very fast.

> Would it be necessary (in a production environment) to stream the old 
> SSTables from the other
> four nodes at some point in time?

Bootstrapping is necessary for consistency and durability, yes. If you were to :

1) start new node without bootstrapping it
2) run "cleanup" compaction on the old node

You would permanently delete the copy of the data that is no longer
"supposed" to live on the old node. With a RF of 1, that data would be
permanently gone. With a RF of >1 you have other copies, but if you
never bootstrap while adding new nodes you are relatively likely to
not be able to access those copies over time.

=Rob

Reply via email to