Re: CQL3 vs Thrift

Peter Lin Wed, 24 Dec 2014 08:58:56 -0800

@Eric  - totally agree. People should choose what is most comfortable for
them, but they should also take time to learn both and really understand
Cassandra at a deep level. Same is true of any database, even if most
people don't bother to read and understand how a piece of technology works.
I've seen some people confused about Cassandra, especially if they go to
github and see the description. new people could get the wrong impression


https://github.com/apache/cassandra

"Row store <http://wiki.apache.org/cassandra/DataModel> means that like
relational databases, Cassandra organizes data by rows and columns. The
Cassandra Query Language (CQL) is a close relative of SQL."



On Wed, Dec 24, 2014 at 11:43 AM, Eric Stevens <migh...@gmail.com> wrote:

> As Ryan mentioned, CQL is simply a translation layer to the underlying
> storage mechanism you're already familiar with with Thrift.
>
> There are definitely corner cases where it's not possible to get a
> one-for-one equivalent in CQL vs Thrift, and even when there's equivalents,
> the underlying data might not look exactly the same (eg, if you used string
> composites instead of native composites, or several mixed composite types,
> and so on).
>
> CQL is not meant to provide SQL equivalency.  It's not only missing many
> SQL constructs, it's also got a number of unique constructs of its own.
> It's meant to be familiar looking to people comfortable with SQL, but you
> cannot reason about it the same way.
>
> Everyone is of course free to use the access layer they prefer, but
> personally I would recommend building all new features using a CQL oriented
> approach.  The Thrift interface is frozen, it will not get new features,
> and there are some really awesome features already released only for CQL,
> and more are coming.  Find a path that works for you in CQL; we had to
> change our thinking about a number of things, but it's worth the effort.
>
> On Wed, Dec 24, 2014 at 8:48 AM, Peter Lin <wool...@gmail.com> wrote:
>
>>
>> basically any time you want to store maps of maps, lists of lists or
>> actual java objects, CQL is not a good fit. CQL is really only good for
>> primitive types, flat lists, maps and sets.
>>
>> Using Cassandra pure with static columns is perfectly valid, but I don't
>> live in that world. Most of what I do requires dynamic columns mixed with
>> static columns in a single column family. This will sounds like heresy, but
>> an use case that fits perfectly in SQL model, you're better off using
>> something like VoltDB which gives you 100% SQL with ACID.
>>
>>
>>
>> On Wed, Dec 24, 2014 at 10:38 AM, Kai Wang <dep...@gmail.com> wrote:
>>
>>> Ryan,
>>>
>>> Can you elaborate a little on "Thrift over CQL is modeling clustering
>>> columns in different nesting between rows is trivial in Thrift and not
>>> really doable in CQL"?
>>> On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvi...@datastax.com> wrote:
>>>
>>>> I'm not entirely certain how you can't model that to solve your use
>>>> case (wouldn't you be filtering the events as well, and therefore be able
>>>> to get all that in one query).
>>>>
>>>>  What you describe there has a number of avenues (collections, just
>>>> heavier use of statics in a different order than you specified, object dump
>>>> of events in a single column, switching up the clustering columns) of
>>>> getting your question answered in one query. End of the day cql resolves to
>>>> a given SStable format, you can still open up cassandra-cli and view what a
>>>> given model looks like, when you've grokked this adequately you basically
>>>> can bend CQL to fit your logical thrift modeling, at some point like
>>>> learning any new language you'll learn to speak in both ( something I have
>>>> to do nearly daily).
>>>>
>>>> FWIW other than the primary valid complaint remaining for Thrift over
>>>> CQL is modeling clustering columns in different nesting between rows is
>>>> trivial in Thrift and not really doable in CQL (clustering columns enforce
>>>> a nesting order by logical construct), I've yet to not be able to swap a
>>>> client from thrift to CQL ,and it's always ended up faster (so far).
>>>>
>>>> The main reason for this is performance on modern Cassandra and the
>>>> native protocol is substantially better than pure thrift for many query
>>>> types (see
>>>> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) ,
>>>> so your mileage may vary, but I'd test it out first before proclaiming that
>>>> thrift is faster for your use case (and make liberal use of cql features
>>>> with cassandra-cli to make sure you know what's going on internally,
>>>> remember it's all just sstables underneath).
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.clim...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks, Ryan.  I wasn't aware of static column support, and indeed
>>>>> they get me most of what I need.  I think the only potential inefficiency
>>>>>  is still at query time.  Using Thrift, I could design the column family 
>>>>> to
>>>>> get the all the static and dynamic content in a single query.
>>>>> If event_source and total_events are instead implemented as CQL3
>>>>> statics, I probably need to do two queries to get data for a given
>>>>> event_type
>>>>>
>>>>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>>>>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>>>>> 'some-type'
>>>>>
>>>>> To get the events:
>>>>> SELECT insertion_time, event FROM timeseries
>>>>>
>>>>> As a combined query, my concern is related to the overhead of
>>>>> repeating event_type/source/total_events (although with potentially many
>>>>> other pieces of static information).
>>>>>
>>>>> More generally, do you find that tuned applications tend to use
>>>>> Thrift, a combination of Thrift and CQL3, or is CQL3 really expected to
>>>>> replace Thrift?
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvi...@datastax.com>
>>>>> wrote:
>>>>>
>>>>>> Don't static columns get you what you want?
>>>>>>
>>>>>>
>>>>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>>>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.clim...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift
>>>>>>> suggest new projects should use CQL3.
>>>>>>>
>>>>>>> I'm wondering, however, if there are certain use cases not well
>>>>>>> covered by CQL3.  Consider the standard timeseries example:
>>>>>>>
>>>>>>> CREATE TABLE timeseries (
>>>>>>>    event_type text,
>>>>>>>    insertion_time timestamp,
>>>>>>>    event blob,
>>>>>>>    PRIMARY KEY (event_type, insertion_time)
>>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>>>
>>>>>>> What happens if I want to store additional information that is
>>>>>>> shared by all events in the given series (but that I don't want to 
>>>>>>> include
>>>>>>> in the row ID): e.g. the event source, a cached count of the number of
>>>>>>> events logged to date, etc.?  I might try updating the definition as
>>>>>>> follows:
>>>>>>>
>>>>>>> CREATE TABLE timeseries (
>>>>>>>    event_type text,
>>>>>>>       event_source text,
>>>>>>>    total_events int,
>>>>>>>    insertion_time timestamp,
>>>>>>>    event blob,
>>>>>>>    PRIMARY KEY (event_type, event_source, total_events,
>>>>>>> insertion_time)
>>>>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>>>>
>>>>>>> Is this not inefficient?  When inserting or querying via CQL3, say
>>>>>>> in batches of up to 1000 events, won't the type/source/count be repeated
>>>>>>> 1000 times?  Please let me know if I'm misunderstanding something, or 
>>>>>>> if I
>>>>>>> should be sticking to Thrift for situations like this involving mixed
>>>>>>> static/dynamic data.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Ryan Svihla
>>>>
>>>> Solution Architect
>>>>
>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>
>>>> DataStax is the fastest, most scalable distributed database technology,
>>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>>> size. With more than 500 customers in 45 countries, DataStax is the
>>>> database technology and transactional backbone of choice for the worlds
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>>
>>
>

Re: CQL3 vs Thrift

Reply via email to