> There is no "upgrade path". I don't think that's true. The goal of the blog post you've linked is to discuss that upgrade path (and in particular show that for the most part, you can access your thrift data from CQL3 without any modification whatsoever).
> You adopt CQL3's sparse tables as soon as you start creating column families > from CQL. That's not true, you can create non sparse from CQL3 (using COMPACT STORAGE) and so you can work with both CQL3 and thrift alongside the time it takes you to upgrade from thrift to CQL3. Then, for things that you know you will only access to CQL3 (i.e. when the "upgrade is complete"), you can start using non compact tables and enjoy their convenience (like collections for instance). > There is not much backwards compatibility. CQL3 can query compact tables, but > you may have to remove the metadata from them so they can be transposed. I think "not much backwards compatibility" is a tad unfair. The only case where you "may have to remove the metadata" is if you are using a CF in both a static and dynamic way. Now I can't pretend knowing what every user is doing, but from my experience and what I've seen, this is not such a common thing and CF are either static or dynamic in nature, not both. I do think that for most user upgrading from thrift to CQL3 won't require any data migration or messing with metadata. But more importantly, things are not completely closed. If you have *concrete* difficulties moving from thrift to CQL3, please do share them on this mailing list and we'll try to help you out. > Thrift can not write into CQL tables easily, because of how the primary keys > and column names are encoded into the key column and compact metadata is not > equal to cql3's metadata. I'd be clear, CQL3 is meant as an upgrade from thrift. Not a mandatory one, you can stick to thrift if you don't think CQL3 is better. But if you do decide to upgrade, you should see CQL3 non compact tables as the new stuff, the thing that you use post-upgrade. While you upgrade, stick to compact tables. Once you've upgraded, then you can start using the new stuff and accessing the new stuff the old way doesn't matter. > My biggest beefs are: > 1) column names are UTF8 (seems wasteful in most cases) That's largely not true, the "wasteful in most cases" part at least. A column name in CQL3 does not always translate to a internal column name. You can still do your time series where the internal column name is an int and you don't waste space. As for the static cases, yes, CQL3 forces UTF8, I'm pretty certain that people overwhelmingly use UTF8 or ascii in those cases. And because CQL3 forces you to declare your column names in those static cases, we may actually be able to optimize the size used internally for those in the future, which is harder with thrift, so I think we actually have the potential to make is less wasteful in most cases. > 2) sparse empty row to ghost (seems like tiny rows with one column have much > overhead now) It is true that for non compact CQL3 we've focused on flexibility and on making the behavior predictable, which does adds some slight space overhead. However: - that's why compact storage is here. There is zero overhead over thrift if you use compact storage. That's even why we named it like that, it's compact. - we know that most the overhead of non compact tables can be win back by optimization of the storage engine. That's an advantage of having an API that is not too ties to the underlying storage: it gives room for optimizations. > 3) using composites (with (compound primary keys) in some table designs) is > wasteful. Composite adds two unsigned bytes for size and one unsigned byte as > 0 per part. See above. > 4) many lines of code between user/request and actual disk. (tracing a CQL > select VS a slice, young gen, etc) If you are saying the implementation of CQL3 is more lines of code than the thrift part, then you're probably right, but given how much convenient CQL3 is compared to thrift, I happily take that criticism. But in term of overhead, provided you use prepared statement (which you should if you care about performance), then it remains to be proven that CQL3 has more overhead than thrift. In particular in terms of garbage (since you're citing young gen), while I haven't tested it, I'd be *really* surprised if thrift is generating less garbage than CQL3. And in term of the query tracing there is almost no difference whatsoever between the two. > 5) not sure if "collections" can be used in REALLY wide row scenarios. aka > 1,000,000 entry set? Lists have their downsides (listed in the documentation) but for sets and maps, they have no more limitation than wide rows have in theory. They do have the limitation with the currently the API don't allow to fetch parts of a collection. But that will change. That being said and possibly more importantly, collections are *not* meant to be very wide. They are *not* meant for wide row scenarios. CQL3 has wide rows support (in the sense of thrift) *without* collections and for true wide row scenarios you want to dedicate it a CF, because that is the right thing to do. -- Sylvain