Re: CQL 3 and wide rows

Maciej Miklas Tue, 20 May 2014 07:08:15 -0700

Hi Aron,

Thanks for the answer!



Lest consider such CLI code:

for(int i = 0 ; i < 10_000_000 ; i++) {
  set[‘rowKey1’][‘myCol::i’] = UUID.randomUUID();
}


The code above will create single row, that contains 10^6 columns sorted by 
‘i’. This will work fine, and this is the wide row to my understanding - row 
that holds many columns AND I can read only some part of it by right slice 
query. On the other hand side, I can iterate over all columns without latencies 
because data is stored on single node. I’ve been using similar structures as 
replacement for secondary indexes - it’s well known pattern.

How would I model it in CQL 3?

1) I could create Map, but Maps are fully loaded into memory, and Map 
containing 10^6 elements is definitely a problem. Plus it’s a big waste of RAM 
if you consider that I need only to read small subset.

2) I could alter table for each new column, which would create similar 
structure to this one from my CLI example. But it looks to me that all columns 
names are loaded into ram, which is still large limitation. I hope that I am 
wrong here - I am not sure.

3) I could redesign my model and divide data into many rows, but why would I do 
that, if I can use wide rows.

My idea of wide row, is a row that can hold large amount of key-value pairs (in 
any form), where I can filter on those keys to efficiently load only that part 
which I currently need.


Regards,
Maciej 


On 20 May 2014, at 09:06, Aaron Morton <aa...@thelastpickle.com> wrote:

> In a CQL 3 table the only **column** names are the ones defined in the table, 
> in the example below there are three column names. 
> 
> 
>>> CREATE TABLE keyspace.widerow (
>>> row_key text,
>>> wide_row_column text,
>>> data_column text,
>>> PRIMARY KEY (row_key, wide_row_column));
>>> 
>>> Check out, for example, 
>>> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.
> 
> Internally there may be more **cells** ( as we now call the internal 
> columns). In the example above each value for row_key will create a single 
> partition (as we now call internal storage engine rows). In each of those 
> partitions there will be cells for each CQL 3 row that has the same row_key, 
> those cells will use a Composite for the name. The first part of the 
> composite will be the value of the wide_row_column and the second will be the 
> literal name of the non primary key columns. 
> 
> IMHO Wide partitions (storage engine rows) are more prevalent in CQL3 than 
> thrift models. 
> 
>> But still - I do not see Iteration, so it looks to me that CQL 3 is limited 
>> when compared to CLI/Hector.
> Now days you can do pretty much everything you can in cli. Provide an example 
> and we may be able to help. 
> 
> Cheers
> Aaron
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 20/05/2014, at 8:18 am, Maciej Miklas <mac.mik...@gmail.com> wrote:
> 
>> Hi James,
>> 
>> Clustering is based on rows. I think that you meant not clustering columns, 
>> but compound columns. Still all columns belong to single table and are 
>> stored within single folder on one computer. And it looks to me (but I’am 
>> not sure) that CQL 3 driver loads all column names into memory - which is 
>> confusing to me. From one side we have wide row, but we load whole into 
>> ram…..
>> 
>> My understanding of wide row is a row that supports millions of columns, or 
>> similar things like map or set. In CLI you would generate column names (or 
>> use compound columns) to simulate set or map,  in CQL 3 you would use some 
>> static names plus Map or Set structures, or you could still alter table and 
>> have large number of columns. But still - I do not see Iteration, so it 
>> looks to me that CQL 3 is limited when compared to CLI/Hector.
>> 
>> 
>> Regards,
>> Maciej
>> 
>> On 19 May 2014, at 17:30, James Campbell <ja...@breachintelligence.com> 
>> wrote:
>> 
>>> Maciej,
>>> 
>>> In CQL3 "wide rows" are expected to be created using clustering columns.  
>>> So while the schema will have a relatively smaller number of named columns, 
>>> the effect is a wide row.  For example:
>>> 
>>> CREATE TABLE keyspace.widerow (
>>> row_key text,
>>> wide_row_column text,
>>> data_column text,
>>> PRIMARY KEY (row_key, wide_row_column));
>>> 
>>> Check out, for example, 
>>> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1.
>>> 
>>> James
>>> From: Maciej Miklas <mac.mik...@gmail.com>
>>> Sent: Monday, May 19, 2014 11:20 AM
>>> To: user@cassandra.apache.org
>>> Subject: CQL 3 and wide rows
>>>  
>>> Hi *,
>>> 
>>> I’ve checked DataStax driver code for CQL 3, and it looks like the column 
>>> names for particular table are fully loaded into memory, it this true?
>>> 
>>> Cassandra should support wide rows, meaning tables with millions of 
>>> columns. Knowing that, I would expect kind of iterator for column names. Am 
>>> I missing something here? 
>>> 
>>> 
>>> Regards,
>>> Maciej Miklas
>> 
>

Re: CQL 3 and wide rows

Reply via email to