@Eric - totally agree. People should choose what is most comfortable for them, but they should also take time to learn both and really understand Cassandra at a deep level. Same is true of any database, even if most people don't bother to read and understand how a piece of technology works. I've seen some people confused about Cassandra, especially if they go to github and see the description. new people could get the wrong impression
https://github.com/apache/cassandra "Row store <http://wiki.apache.org/cassandra/DataModel> means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL." On Wed, Dec 24, 2014 at 11:43 AM, Eric Stevens <migh...@gmail.com> wrote: > As Ryan mentioned, CQL is simply a translation layer to the underlying > storage mechanism you're already familiar with with Thrift. > > There are definitely corner cases where it's not possible to get a > one-for-one equivalent in CQL vs Thrift, and even when there's equivalents, > the underlying data might not look exactly the same (eg, if you used string > composites instead of native composites, or several mixed composite types, > and so on). > > CQL is not meant to provide SQL equivalency. It's not only missing many > SQL constructs, it's also got a number of unique constructs of its own. > It's meant to be familiar looking to people comfortable with SQL, but you > cannot reason about it the same way. > > Everyone is of course free to use the access layer they prefer, but > personally I would recommend building all new features using a CQL oriented > approach. The Thrift interface is frozen, it will not get new features, > and there are some really awesome features already released only for CQL, > and more are coming. Find a path that works for you in CQL; we had to > change our thinking about a number of things, but it's worth the effort. > > On Wed, Dec 24, 2014 at 8:48 AM, Peter Lin <wool...@gmail.com> wrote: > >> >> basically any time you want to store maps of maps, lists of lists or >> actual java objects, CQL is not a good fit. CQL is really only good for >> primitive types, flat lists, maps and sets. >> >> Using Cassandra pure with static columns is perfectly valid, but I don't >> live in that world. Most of what I do requires dynamic columns mixed with >> static columns in a single column family. This will sounds like heresy, but >> an use case that fits perfectly in SQL model, you're better off using >> something like VoltDB which gives you 100% SQL with ACID. >> >> >> >> On Wed, Dec 24, 2014 at 10:38 AM, Kai Wang <dep...@gmail.com> wrote: >> >>> Ryan, >>> >>> Can you elaborate a little on "Thrift over CQL is modeling clustering >>> columns in different nesting between rows is trivial in Thrift and not >>> really doable in CQL"? >>> On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote: >>> >>>> I'm not entirely certain how you can't model that to solve your use >>>> case (wouldn't you be filtering the events as well, and therefore be able >>>> to get all that in one query). >>>> >>>> What you describe there has a number of avenues (collections, just >>>> heavier use of statics in a different order than you specified, object dump >>>> of events in a single column, switching up the clustering columns) of >>>> getting your question answered in one query. End of the day cql resolves to >>>> a given SStable format, you can still open up cassandra-cli and view what a >>>> given model looks like, when you've grokked this adequately you basically >>>> can bend CQL to fit your logical thrift modeling, at some point like >>>> learning any new language you'll learn to speak in both ( something I have >>>> to do nearly daily). >>>> >>>> FWIW other than the primary valid complaint remaining for Thrift over >>>> CQL is modeling clustering columns in different nesting between rows is >>>> trivial in Thrift and not really doable in CQL (clustering columns enforce >>>> a nesting order by logical construct), I've yet to not be able to swap a >>>> client from thrift to CQL ,and it's always ended up faster (so far). >>>> >>>> The main reason for this is performance on modern Cassandra and the >>>> native protocol is substantially better than pure thrift for many query >>>> types (see >>>> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , >>>> so your mileage may vary, but I'd test it out first before proclaiming that >>>> thrift is faster for your use case (and make liberal use of cql features >>>> with cassandra-cli to make sure you know what's going on internally, >>>> remember it's all just sstables underneath). >>>> >>>> >>>> >>>> >>>> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com> >>>> wrote: >>>> >>>>> Thanks, Ryan. I wasn't aware of static column support, and indeed >>>>> they get me most of what I need. I think the only potential inefficiency >>>>> is still at query time. Using Thrift, I could design the column family >>>>> to >>>>> get the all the static and dynamic content in a single query. >>>>> If event_source and total_events are instead implemented as CQL3 >>>>> statics, I probably need to do two queries to get data for a given >>>>> event_type >>>>> >>>>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?): >>>>> SELECT event_source, total_events FROM timeseries WHERE event_type = >>>>> 'some-type' >>>>> >>>>> To get the events: >>>>> SELECT insertion_time, event FROM timeseries >>>>> >>>>> As a combined query, my concern is related to the overhead of >>>>> repeating event_type/source/total_events (although with potentially many >>>>> other pieces of static information). >>>>> >>>>> More generally, do you find that tuned applications tend to use >>>>> Thrift, a combination of Thrift and CQL3, or is CQL3 really expected to >>>>> replace Thrift? >>>>> >>>>> Thanks again! >>>>> >>>>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com> >>>>> wrote: >>>>> >>>>>> Don't static columns get you what you want? >>>>>> >>>>>> >>>>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html >>>>>> On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3. Pages >>>>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift >>>>>>> suggest new projects should use CQL3. >>>>>>> >>>>>>> I'm wondering, however, if there are certain use cases not well >>>>>>> covered by CQL3. Consider the standard timeseries example: >>>>>>> >>>>>>> CREATE TABLE timeseries ( >>>>>>> event_type text, >>>>>>> insertion_time timestamp, >>>>>>> event blob, >>>>>>> PRIMARY KEY (event_type, insertion_time) >>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>>>>> >>>>>>> What happens if I want to store additional information that is >>>>>>> shared by all events in the given series (but that I don't want to >>>>>>> include >>>>>>> in the row ID): e.g. the event source, a cached count of the number of >>>>>>> events logged to date, etc.? I might try updating the definition as >>>>>>> follows: >>>>>>> >>>>>>> CREATE TABLE timeseries ( >>>>>>> event_type text, >>>>>>> event_source text, >>>>>>> total_events int, >>>>>>> insertion_time timestamp, >>>>>>> event blob, >>>>>>> PRIMARY KEY (event_type, event_source, total_events, >>>>>>> insertion_time) >>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC); >>>>>>> >>>>>>> Is this not inefficient? When inserting or querying via CQL3, say >>>>>>> in batches of up to 1000 events, won't the type/source/count be repeated >>>>>>> 1000 times? Please let me know if I'm misunderstanding something, or >>>>>>> if I >>>>>>> should be sticking to Thrift for situations like this involving mixed >>>>>>> static/dynamic data. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>> >>>> Ryan Svihla >>>> >>>> Solution Architect >>>> >>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>> >>>> DataStax is the fastest, most scalable distributed database technology, >>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>> size. With more than 500 customers in 45 countries, DataStax is the >>>> database technology and transactional backbone of choice for the worlds >>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>> >>>> >> >