Excellent Sylvain! Yes, that seems to remove the linear scan component of slice read times.
FYI, I still see some interesting difference in some aspects though. If I do a slice without a start (i.e., get me the first column)...it seems to fly. GET("K", :count => 1 ) -- 4.832877 -->> very fast, and actually in this case I see the reading client being the bottleneck, not cassandra (which it is at about 20% CPU only) If I do the same, but actually specifying the start column with the first existing value...GET("K",:start => '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 ) -- 11.084275 -->> half as fast, and using twice the CPU...hovering about 50% or more. (again Cassandra is not the bottleneck, but the significant data is that the initial seeking seems to be doubling the time/cpu If I do the same, starting by the middle. GET("K",:start => '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 ) -- 11.038187 --> as expensive as starting from the beginning The same starting at the last one. GET("K",:start => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 ) -- 6.489683 -> Much faster than any other slice ... although not quite as fast as not using a start column I could see that not having to seek into whatever backing "map/structure" is obviously faster...although I'm surprised that seeking to an initial value results in half as slow reads. Wouldn't this mostly imply following some links/pointers in memory to start reading ordered columns? What is the backing store used for Memtables when column slices are performed? I am not sure why starting at the end (without reversing or anything) yields much better performance. Cheers, Josep M. On Wed, Jan 18, 2012 at 12:57 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer <blanq...@rightscale.com> > wrote: > > Hi, > > > > I've been doing some tests using wide rows recently, and I've seen some > odd > > performance problems that I'd like to understand. > > > > In particular, I've seen that the time it takes for Cassandra to perform > a > > column slice of a single key, solely in a Memtable, seems to be very > > expensive, but most importantly proportional to the ordered position > where > > the start column of the slice lives. > > > > In other words: > > 1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID > > comparator) > > 2- I create a single Row with Key "K" > > 3- Then add 200K TimeUUID columns to key "K" > > 4- (and make sure nothing is flushed to SSTables...so it's all in the > > Memtable) > > > > ...I observe the following timings (secondds to perform 1000 reads) while > > performing multiget slices on it: (pardon the pseudo-code, but you'll > get > > the gist) > > > > a) simply a get of the first column: GET("K",:count=>1) > > -- 2.351226 > > > > b) doing a slice get, starting from the first column: GET("K",:start => > > '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 ) > > -- 2.189224 <<- so with or without "start" doesn't seem to make much > of > > a difference > > > > c) doing a slice get, starting from the middle of the ordered > > columns..approx starting at item number 100K: GET("K",:start => > > '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 ) > > -- 11.849326 <<- 5 times more expensive if the start of the slice is > 100K > > positions away > > > > d) doing a slice get, starting from the last of the ordered > columns..approx > > position 200K: GET("K",:start > => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , > > :count => 1 ) > > -- 19.889741 <<- Almost twice as expensive than starting the slice at > > position 100K, and 10 times more expensive than starting from the first > one > > > > This behavior leads me to believe that there's a clear Memtable column > scan > > for the columns of the key. > > If one tries a column name read on those positions (i.e., not a slice), > the > > performance is constant. I.e., GET("K", > > '144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or > > last timeUUID is done in the same amount of time. > > > > Having increasingly worse performance for column slices in Memtables > seems > > to be a bit of a problem...aren't Memtables backed by a structure that > has > > some sort of column name indexing?...so that landing on the start column > can > > be efficient? I'm definitely observing very high CPU utilization on those > > scans...By the way, with wide columns like this, slicing SSTables is > quite > > faster than slicing Memtables...I'm attributing that to the sampled > index of > > the SSTables, hence that's why I'm wondering if the Memtables do not have > > such column indexing builtin and resort to linked lists of sort.... > > > > Note, that the actual timings shown are not important, it's in my laptop > and > > I have a small amount of debugging enabled...what it is important is the > > difference between then. > > > > I'm using Cassandra trunk as of Dec 1st, but I believe I've done > experiments > > with 0.8 series too, leading to the same issue. > > You may want to retry your experiments on current trunk. We do had > inefficiency > in our memtable search that was fixed by: > https://issues.apache.org/jira/browse/CASSANDRA-3545 > (the name of the ticket doesn't make it clear that it's related but it is) > > The issue was committed on December 8. > > -- > Sylvain > > > > > Cheers, > > > > Josep M. >