Just to add 2 more cents... :) The CQL3 protocol is asynchronous. This can provide a substantial throughput increase, according to my benchmarking, when one uses non-blocking techniques.
It is also peer-to-peer. Hence the server can generate events to send to the client, e.g. schema changes - in general, 'triggers' become possible. ml On Fri, Jun 13, 2014 at 6:21 PM, graham sanderson <gra...@vast.com> wrote: > My 2 cents… > > A motivation for CQL3 AFAIK was to make Cassandra more familiar to SQL > users. This is a valid goal, and works well in many cases. > Equally there are use cases (that some might find ugly) where Cassandra is > chosen explicitly because of the sorts of things you can do at the thrift > level, which aren’t (currently) exposed via CQL3 > > To Robert’s point earlier - "Rational people should presume that Thrift > support must eventually disappear”… he is probably right (though frankly > I’d rather the non-blocking thrift version was added instead). However if > we do get rid of the thrift interface, then it needs to be at a time that > CQLn is capable of expressing all the things you could do via the thrift > API. Note, I need to go look and see if the non-blocking thrift version > also requires materializing the entire thrift object in memory. > > On Jun 13, 2014, at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > > There are always the pros and the cons with a querying language, as always. > > But as far as I can see, the advantages of Thrift I can see over CQL3 are: > > 1) Thrift require a little bit less decoding server-side (a difference > around 10% in CPU usage). > > 2) Thrift use more "compact" storage because CQL3 need to add extra > "marker" columns to guarantee the existence of primary key. It is worsen > when you use clustering columns because for each distinct clustering group > you have a related "marker" columns. > > That being said, point 1) is not really an issue since most of the time > nodes are more I/O bound than CPU bound. Only in extreme cases where you > have incredible read rate with data that fits entirely in memory that you > may notice the difference. > > For point 2) this is a small trade-off to have access to a query language > and being able to do slice queries using the WHERE clause. Some like it, > other hate it, it's just a question of taste. Please note that the "waste" > in disk space is somehow mitigated by compression. > > Long story short I think Thrift may have appropriate usage but only in > very few use cases. Recently a lot of improvement and features have been > added to CQL3 so that it shoud be considered as the first choice for most > users and if they fall into those few use cases then switch back to Thrift > > My 2 cents > > > > > > > On Fri, Jun 13, 2014 at 11:43 PM, Peter Lin <wool...@gmail.com> wrote: > >> >> With text based query approach like CQL, you loose the type with dynamic >> columns. Yes, we're storing it as bytes, but it is simpler and easier with >> Thrift to do these types of things. >> >> I like CQL3 and what it does, but text based query languages make certain >> dynamic schema use cases painful. Having used and built ORM's they are >> poorly suited to dynamic schemas. If you've never had to write an ORM to >> handle dynamic user defined schemas at runtime, it's tough to see where the >> problems arise and how that makes life painful. >> >> Just to be clear, I'm not saying "don't use CQL3" or "CQL3 is bad". I'm >> saying CQL3 is good for certain kinds of use cases and Thrift is good at >> certain use cases. People need to look at what and how they're storing data >> and do what makes the most sense to them. Slavishly following CQL3 doesn't >> make any sense to me. >> >> >> >> On Fri, Jun 13, 2014 at 5:30 PM, DuyHai Doan <doanduy...@gmail.com> >> wrote: >> >>> "the validation type is set to bytes, and my code is type safe, so it >>> knows which serializers to use. Those dynamic columns are driven off the >>> types in Java." --> Correct. However, you are still bound by the column >>> comparator type which should be fixed (unless again you set it to bytes, in >>> this case you loose the ordering and sorting feature) >>> >>> Basically what you are doing is telling Cassandra to save data in the >>> cells as raw bytes, the serialization is taken care client side using the >>> appropriate serializer. This is perfectly a valid strategy. >>> >>> But how is it different from using CQL3 and setting the value to "blob" >>> (equivalent to bytes) and take care of the serialization client-side also ? >>> You can even imagine saving value in JSON format and set the type to "text". >>> >>> Really, I don't see why CQL3 cannot achieve the scenario you describe. >>> >>> For the record, when you create a table in CQL3 as follow: >>> >>> CREATE TABLE user ( >>> id bigint PRIMARY KEY, >>> firstname text, >>> lastname text, >>> last_connection timestamp, >>> ....); >>> >>> C* will create a column family with validation type = bytes to >>> accommodate the timestamp and text types for the firstname, lastname and >>> last_connection columns. Basically the CQL3 engine is doing the >>> serialization server-side for you >>> >>> >>> >>> >>> >>> >>> On Fri, Jun 13, 2014 at 11:19 PM, Peter Lin <wool...@gmail.com> wrote: >>> >>>> >>>> the validation type is set to bytes, and my code is type safe, so it >>>> knows which serializers to use. Those dynamic columns are driven off the >>>> types in Java. >>>> >>>> Having said that, CQL3 does have a new custom type feature, but the >>>> documentation is basically non-existent on how that actually works. One >>>> could also modify CQL such that insert statements gives Cassandra hints >>>> about what type it is, but I'm not aware of anyone enhancing CQL3 to do >>>> that. >>>> >>>> I realize my kind of use case is a bit unique, but I do know of others >>>> that are doing similar kinds of things. >>>> >>>> >>>> >>>> >>>> On Fri, Jun 13, 2014 at 5:11 PM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> In thrift, when creating a column family, you need to define >>>>> >>>>> 1) the row/partition key type >>>>> 2) the column comparator type >>>>> 3) the validation type for the actual value (cell in CQL3 terminology) >>>>> >>>>> Unless you use "dynamic composites" feature, which does not exist (and >>>>> probably won't) in CQL3, I don't see how you can have columns with >>>>> "different types" on the same row/partition >>>>> >>>>> >>>>> On Fri, Jun 13, 2014 at 11:06 PM, Peter Lin <wool...@gmail.com> wrote: >>>>> >>>>>> >>>>>> when I say dynamic column, I mean non-static columns of different >>>>>> types within the same row. Some could be an object or one of the defined >>>>>> datatypes. >>>>>> >>>>>> with thrift I use the appropriate serializer to handle these dynamic >>>>>> columns. >>>>>> >>>>>> >>>>>> On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Well, before talking and discussing about "dynamic columns", we >>>>>>> should first define it clearly. What do people mean by "dynamic columns" >>>>>>> exactly ? Is it the ability to add many columns "of same type" to an >>>>>>> existing physical row? If yes then CQL3 does support it with clustering >>>>>>> columns. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <green...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Yeah I don't anticipate more than 1000 properties, well under in >>>>>>>> fact. I guess the trade off of using the clustered columns is that I'd >>>>>>>> have >>>>>>>> a table that would be tall and skinny which also has its challenges >>>>>>>> w/r/t >>>>>>>> memory. >>>>>>>> >>>>>>>> I'll look into your suggestion a bit more and consider some others >>>>>>>> around a hybrid of CQL and Thrift (where necssary). But from a newb's >>>>>>>> perspective, I sense the community is unsettled around this concept of >>>>>>>> truly dynamic columns. Coming from an HBase background, it's a >>>>>>>> consideration I didn't anticipate having to evaluate. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> about.me <http://about.me/markgreene> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Mark >>>>>>>>> >>>>>>>>> I believe that in your table you want to have some "common" >>>>>>>>> fields that will be there whatever customer is, and other fields that >>>>>>>>> are >>>>>>>>> entirely customer-dependent, isn't it ? >>>>>>>>> >>>>>>>>> In this case, creating a table with static columns for the common >>>>>>>>> fields and a clustering column representing all custom fields defined >>>>>>>>> by a >>>>>>>>> customer could be a solution (see here for static column: >>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6561 ) >>>>>>>>> >>>>>>>>> CREATE TABLE user_data ( >>>>>>>>> user_id bigint, >>>>>>>>> user_firstname text static, >>>>>>>>> user_lastname text static, >>>>>>>>> ... >>>>>>>>> custom_property_name text, >>>>>>>>> custom_property_value text, >>>>>>>>> PRIMARY KEY(user_id, custom_property_name, >>>>>>>>> custom_property_value)); >>>>>>>>> >>>>>>>>> Please note that with this solution you need to have "at least >>>>>>>>> one" custom property per customer to make it work >>>>>>>>> >>>>>>>>> The only thing to take care of is the type of >>>>>>>>> custom_property_value. You need to define it once for all. To >>>>>>>>> accommodate >>>>>>>>> for dynamic types, you can either save the value as blob or text(as >>>>>>>>> JSON) >>>>>>>>> and take care of the serialization/deserialization yourself at the >>>>>>>>> client >>>>>>>>> side >>>>>>>>> >>>>>>>>> As an alternative you can save custom properties in a map, >>>>>>>>> provided that their number is not too large. But considering the >>>>>>>>> business >>>>>>>>> case of CRM, I believe that it's quite rare and user has more than >>>>>>>>> 1000 >>>>>>>>> custom properties isn't it ? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> My use case requires the support of arbitrary columns much like a >>>>>>>>>> CRM. My users can define 'custom' fields within the application. >>>>>>>>>> Ideally I >>>>>>>>>> wouldn't have to change the schema at all, which is why I like the >>>>>>>>>> old >>>>>>>>>> thrift approach rather than the CQL approach. >>>>>>>>>> >>>>>>>>>> Having said all that, I'd be willing to adapt my API to make >>>>>>>>>> explicit schema changes to Cassandra whenever my user makes a change >>>>>>>>>> to >>>>>>>>>> their custom fields if that's an accepted practice. >>>>>>>>>> >>>>>>>>>> Ultimately, I'm trying to figure out of the Cassandra community >>>>>>>>>> intends to support true schemaless use cases in the future. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> about.me <http://about.me/markgreene> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan < >>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> This strikes me as bad practice in the world of multi tenant >>>>>>>>>>> systems. I don't want to create a table per customer. So I'm >>>>>>>>>>> wondering if >>>>>>>>>>> dynamically modifying the table is an accepted practice? --> Can >>>>>>>>>>> you give >>>>>>>>>>> some details about your use case ? How would you "alter" a table >>>>>>>>>>> structure >>>>>>>>>>> to adapt it to a new customer ? >>>>>>>>>>> >>>>>>>>>>> Wouldn't it be better to model your table so that it supports >>>>>>>>>>> addition/removal of customer ? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks DuyHai, >>>>>>>>>>>> >>>>>>>>>>>> I have a follow up question to #2. You mentioned ideally I >>>>>>>>>>>> would create a new table instead of mutating an existing one. >>>>>>>>>>>> >>>>>>>>>>>> This strikes me as bad practice in the world of multi tenant >>>>>>>>>>>> systems. I don't want to create a table per customer. So I'm >>>>>>>>>>>> wondering if >>>>>>>>>>>> dynamically modifying the table is an accepted practice? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> about.me <http://about.me/markgreene> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan < >>>>>>>>>>>> doanduy...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello Mark >>>>>>>>>>>>> >>>>>>>>>>>>> Dynamic columns, as you said, are perfectly supported by CQL3 >>>>>>>>>>>>> via clustering columns. And no, using collections for storing >>>>>>>>>>>>> dynamic data >>>>>>>>>>>>> is a very bad idea if the cardinality is very high (>> 1000 >>>>>>>>>>>>> elements) >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Is using Thrift a valid approach in the era of CQL? --> >>>>>>>>>>>>> Less and less. Unless you are looking for extreme performance, >>>>>>>>>>>>> you'd better >>>>>>>>>>>>> off choosing CQL3. The ease of programming and querying with CQL3 >>>>>>>>>>>>> does >>>>>>>>>>>>> worth the small overhead in CPU >>>>>>>>>>>>> >>>>>>>>>>>>> 2) If CQL is the best practice, should I alter the schema at >>>>>>>>>>>>> runtime when I detect I need to do an schema mutation? --> >>>>>>>>>>>>> Ideally you >>>>>>>>>>>>> should not alter schema but create a new table to adapt to your >>>>>>>>>>>>> changing >>>>>>>>>>>>> requirements. >>>>>>>>>>>>> >>>>>>>>>>>>> 3) If I utilize CQL collections, will Cassandra page the >>>>>>>>>>>>> entire thing into the heap? --> Of course. All collections and >>>>>>>>>>>>> maps in >>>>>>>>>>>>> Cassandra are eagerly loaded entirely in memory on server side. >>>>>>>>>>>>> That's why >>>>>>>>>>>>> it is recommended to limit their cardinality to ~ 1000 elements >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene < >>>>>>>>>>>>> green...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I'm looking for some best practices w/r/t supporting >>>>>>>>>>>>>> arbitrary columns. It seems from the docs I've read around CQL >>>>>>>>>>>>>> that they >>>>>>>>>>>>>> are supported in some capacity via collections but you can't >>>>>>>>>>>>>> exceed 64K in >>>>>>>>>>>>>> size. For my requirements that would cause problems. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So my questions are: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Is using Thrift a valid approach in the era of CQL? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2) If CQL is the best practice, should I alter the schema at >>>>>>>>>>>>>> runtime when I detect I need to do an schema mutation? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3) If I utilize CQL collections, will Cassandra page the >>>>>>>>>>>>>> entire thing into the heap? >>>>>>>>>>>>>> >>>>>>>>>>>>>> My data model is akin to a CRM, arbitrary column definitions >>>>>>>>>>>>>> per customer. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Mark >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > >