To some extent, the boot-strapping problem will be an issue with most solutions: the data has to be duplicated from somewhere. Bootstrapping should not cause much performance degradation unless you are already pushing capacity limits. It's the decommissioning problem which makes Cassandra somewhat problematic in your case. You grow your cluster x5 then write to it. You have to perform a proper decommission when shrinking the cluster again which involves validating and streaming data to the remaining replicas: a fairly serious operation with TBs of data. For most realistic situations, unless the cluster is completely read-only, you cant just kill most of the nodes in the cluster.
I cant really think of a good, general, way to do this with just Cassandra although there may be some hacktastical possibilities. I think a more statically sized Cassandra cluster then a variable cache layer (memcached or similar) is probably a better solution. This option kind of falls apart at the terabytes of data range. Have you considered using S3, Amazon cloud front or some other CDN instead of rolling your own solution? For immutable data, its what they excel at. Cassandra has amazing write capacity and its design focus is on scaling writes. I would not really consider it a good tool for the job of serving massive amounts of static content. Dan -----Original Message----- From: Shaun Cutts [mailto:sh...@cuttshome.net] Sent: March-03-11 13:00 To: user@cassandra.apache.org Subject: question about replicas & dynamic response to load Hello, In our project our usage pattern is likely to be quite variable -- high for a a few days, then lower, etc could vary as much (or more) as 10x from peak to "non-peak". Also, much of our data is immutable -- but there is a considerable amount of it -- perhaps in the single digit TBs. Finally, we are hosting with amazon. I'm looking for advice on how to vary the number of nodes dynamically, in order to reduce our hosting costs at non-peak times. I worry that just adding "new" nodes in response to demand will make things worse -- at least temporarily -- as the new node copies data to itself; then bringing it down will also cause a degradation. I'm wondering if it is possible to bring up exact copies of other nodes? Or alternately to take down a populated node containing (only?) immutable data, then bring it up again when the need arises? Are there reference/reading materials(/blogs) concerning dynamically varying number of nodes in response to demand? Thanks! -- Shaun No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.872 / Virus Database: 271.1.1/3479 - Release Date: 03/03/11 02:34:00