Nice work John. If you learn any more, please share. S
On Sat, Dec 7, 2013 at 11:50 AM, John Sanda <john.sa...@gmail.com> wrote: > I finally got the math right for the partition index after tracing through > SSTableWriter.IndexWriter.append(DecoratedKey key, RowIndexEntry > indexEntry). I should also note that I am working off of the source for > 1.2.9. Here is the break down for what gets written to disk in the append() > call (my keys are 4 bytes while column names and values are both 8 bytes). > > // key > key length - 2 bytes > key - 4 bytes > > // index entry > index entry position - 8 bytes > index entry size - 4 bytes > > // when the index entry contains columns index, the following entries will > be written ceil(total_row_size / column_index_size_in_kb) times > local deletion time - 4 bytes > marked for delete at - 8 bytes > columns index entry first name length - 2 bytes > columns index entry first name - 8 bytes > columns index entry last name length - 2 bytes > columns index entry last name - 8 bytes > columns index entry offset - 8 bytes > columns index entry width - 8 bytes > > I also went through the serialization code for bloom filters, but I do not > understand the math. Even with my slightly improved understanding, I am > still uncertain about how effective any sizing analysis will be since the > numbers of rows and columns will vary per SSTable. > > > On Fri, Dec 6, 2013 at 3:53 PM, John Sanda <john.sa...@gmail.com> wrote: > >> I have done that, but it only gets me so far because the cluster and app >> that manages it is run by 3rd parties. Ideally, I would like to provide my >> end users with a formula or heuristic for establishing some sort of >> baselines that at least gives them a general idea for planning. Generating >> data as you have suggested and as I have done is helpful, but it is hard >> for users to extrapolate out from that. >> >> >> On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden <jacob.rho...@me.com> wrote: >> >>> Not sure what your end setup will be, but I would probably just spin up >>> a cluster and fill it with typical data to and measure the size on disk. >>> >>> ______________________________ >>> Sent from iPhone >>> >>> On 7 Dec 2013, at 6:08 am, John Sanda <john.sa...@gmail.com> wrote: >>> >>> I am trying to do some disk capacity planning. I have been referring the >>> datastax docs[1] and this older blog post[2]. I have a column family with >>> the following, >>> >>> row key - 4 bytes >>> column name - 8 bytes >>> column value - 8 bytes >>> max number of non-deleted columns per row - 20160 >>> >>> Is there an effective way to calculate the sizes (or at least a decent >>> approximation) of the bloom filters and partition indexes on disk? >>> >>> [1] Calculating user data >>> size<http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html> >>> [2] Cassandra Storage Sizing <http://btoddb-cass-storage.blogspot.com/> >>> >>> -- >>> >>> - John >>> >>> >> >> >> -- >> >> - John >> > > > > -- > > - John >