Re: Dynamic Columns in Cassandra 2.X

Peter Lin Fri, 13 Jun 2014 13:17:31 -0700

Like you, I make extensive use of dynamic columns for similar reasons.

In our project, one of the goals is to give "end users" the ability to
design their own schema without having to alter a table. If people really
want strong schema, then just use old Sql or NewSql. RDB gives you the full
power of referential integrity, dynamic views, bitmap indexes, joins and
subqueries. In the past, I've tried to handle dynamic schema use cases with
RDB and it sucked.


For me, Cassandra's real power and sweet spot is this intermediate world
where some of the schema is static, but a significant % is dynamic and
arbitrary.

To do similar kinds of things in RDB, you end up with EAV (entity attribute
value) hell. These are old problems and developers have had to wedge it
into RDB. Over time, these kinds of systems become a total nightmare to
manage, which means developers rebuild them every 6-10 years. I've done
this in the past, so it's a pain point I'm intimately familiar with.

With Cassandra + Thrift + CQL, I've been able to avoid those pain points,
but it does mean using Thrift appropriately. I really wish people wouldn't
dismiss Thrift and ignore all of the benefits it provides. Yes, you can
wedge this kind of dynamic schema use case into CQL, but does that make
sense?



On Fri, Jun 13, 2014 at 4:03 PM, Mark Greene <green...@gmail.com> wrote:

> My use case requires the support of arbitrary columns much like a CRM. My
> users can define 'custom' fields within the application. Ideally I wouldn't
> have to change the schema at all, which is why I like the old thrift
> approach rather than the CQL approach.
>
> Having said all that, I'd be willing to adapt my API to make explicit
> schema changes to Cassandra whenever my user makes a change to their custom
> fields if that's an accepted practice.
>
> Ultimately, I'm trying to figure out of the Cassandra community intends to
> support true schemaless use cases in the future.
>
> --
> about.me <http://about.me/markgreene>
>
>
> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> This strikes me as bad practice in the world of multi tenant systems. I
>> don't want to create a table per customer. So I'm wondering if dynamically
>> modifying the table is an accepted practice?  --> Can you give some details
>> about your use case ? How would you "alter" a table structure to adapt it
>> to a new customer ?
>>
>> Wouldn't it be better to model your table so that it supports
>> addition/removal of customer ?
>>
>>
>>
>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com> wrote:
>>
>>> Thanks DuyHai,
>>>
>>> I have a follow up question to #2. You mentioned ideally I would create
>>> a new table instead of mutating an existing one.
>>>
>>> This strikes me as bad practice in the world of multi tenant systems. I
>>> don't want to create a table per customer. So I'm wondering if dynamically
>>> modifying the table is an accepted practice?
>>>
>>> --
>>> about.me <http://about.me/markgreene>
>>>
>>>
>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> Hello Mark
>>>>
>>>>  Dynamic columns, as you said, are perfectly supported by CQL3 via
>>>> clustering columns. And no, using collections for storing dynamic data is a
>>>> very bad idea if the cardinality is very high (>> 1000 elements)
>>>>
>>>> 1)  Is using Thrift a valid approach in the era of CQL?  --> Less and
>>>> less. Unless you are looking for extreme performance, you'd better off
>>>> choosing CQL3. The ease of programming and querying with CQL3 does worth
>>>> the small overhead in CPU
>>>>
>>>> 2) If CQL is the best practice,  should I alter the schema at runtime
>>>> when I detect I need to do an schema mutation?  --> Ideally you should not
>>>> alter schema but create a new table to adapt to your changing requirements.
>>>>
>>>> 3) If I utilize CQL collections, will Cassandra page the entire thing
>>>> into the heap?  --> Of course. All collections and maps in Cassandra are
>>>> eagerly loaded entirely in memory on server side. That's why it is
>>>> recommended to limit their cardinality to ~ 1000 elements
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'm looking for some best practices w/r/t supporting arbitrary
>>>>> columns. It seems from the docs I've read around CQL that they are
>>>>> supported in some capacity via collections but you can't exceed 64K in
>>>>> size. For my requirements that would cause problems.
>>>>>
>>>>> So my questions are:
>>>>>
>>>>> 1)  Is using Thrift a valid approach in the era of CQL?
>>>>>
>>>>> 2) If CQL is the best practice,  should I alter the schema at runtime
>>>>> when I detect I need to do an schema mutation?
>>>>>
>>>>>  3) If I utilize CQL collections, will Cassandra page the entire
>>>>> thing into the heap?
>>>>>
>>>>> My data model is akin to a CRM, arbitrary column definitions per
>>>>> customer.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Mark
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Dynamic Columns in Cassandra 2.X

Reply via email to