Hi, I have some questions related to the SSTable in the Cassandra, as I am 
doing a project to use it and hope someone in this list can share some thoughts.
My understand is the SSTable is per column family. But each column family could 
have multi SSTable files. During the runtime, one row COULD split into more 
than one SSTable file, even this is not good for performance, but it does 
happen, and Cassandra will try to merge and store one row data into one SSTable 
file during compassion.
The question is when one row is split in multi SSTable files, what is the 
boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I 
run sstable2json tool to run on both SSTable files individually:
1) I will expect same row key could show up in both sstable2json output, as 
this one row exists in both SSTable files, right?2) If so, what is the 
boundary? Will Cassandra guarantee the column level as the boundary? What I 
mean is that for one column's data, it will be guaranteed to be either in the 
first file, or 2nd file, right? There is no chance that Cassandra will cut the 
data of one column into 2 part, and one part stored in first SSTable file, and 
the other part stored in second SSTable file. Is my understanding correct?3) If 
what we are talking about are only the SSTable files in snapshot, incremental 
backup SSTable files, exclude the runtime SSTable files, will anything change? 
For snapshot or incremental backup SSTable files, first can one row data still 
may exist in more than one SSTable file? And any boundary change in this 
case?4) If I want to use incremental backup SSTable files as the way to catch 
data being changed, is it a good way to do what I try to archive? In this case, 
what happen in the following example:
For column family A:at Time 0, one row key (key1) has some data. It is being 
stored and back up in SSTable file 1.at Time 1, if any column for key1 has any 
change (a new column insert, a column updated/deleted, or even whole row being 
deleted), I will expect this whole row exists in the any incremental backup 
SSTable files after time 1, right?
What happen if the above row just happen to store in more than one SSTable 
file?at Time 0, one row key (key1) has some data, and it just is stored in 
SSTable file1 and file2, and being backup.at Time 1, if one column is added in 
row key1, and the change in fact will happen in SSTable file2 only in this 
case, and if we do a incremental backup after that, what SSTable files should I 
expect in this backup? Both SSTable files? Or Just SSTable file 2?
I was thinking incremental backup SSTable files are good candidate for catching 
data being changed, but as one row data could exist in multi SSTable file makes 
thing complex now. Did anyone have any experience to use SSTable file in this 
way? What are the lessons?
Thanks
Yong                                      

Reply via email to