Re: Get all keys from the cluster

2012-01-23 Thread aaron morton
If you want to keep the load out of the cassandra process and do the join to sql off line, take a look at the bin/sstablekeys utility. This will let you output the keys in an sstable. You will need to do it for every sstable on every node, create the unique list and then check in your SQL db

Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
We're running a 8 node cluster with different CFs for different applications. One of the application uses 1.5TB out of 1.8TB in total, but only because we started out with a deletion mechanism and implemented one later on. So there is probably a high amount of old data in there, that we don't

Re: Get all keys from the cluster

2012-01-21 Thread Eric Czech
Is there any way that you could do that lookup in reverse where you pull the records from your SQL database, figure out which keys aren't necessary, and then delete any unnecessary keys that may or may not exist in cassandra? If that's not a possibility, then what about creating the same

Re: Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
Thanks for your suggestions, Eric! One of the application uses 1.5TB out of 1.8TB I'm sorry, maybe that statment was slightly ambiguous. I meant to say, that one application uses 1.5TB, while the others use 300GB, totalling in 1.8TB of data. Our total disk capacity, however, is at about 7 TB,

Re: Get all keys from the cluster

2012-01-21 Thread Eric Czech
Great! I'm glad at least one of those ideas was helpful for you. That's a road we've travelled before and as one last suggestion that might help, you could alter all client writers to cassandra beforehand so that they write to BOTH keyspaces BEFORE beginning the SQL based transfer. This might