[ 
https://issues.apache.org/jira/browse/CASSANDRA-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837864#comment-13837864
 ] 

Ondřej Černoš commented on CASSANDRA-6428:
------------------------------------------

Thanks a lot Sylvain for your time and answers. It is really appreciated.

I think the whole thing boils down to two issues:

* the size of collection in native protocol, which is workaroundable now by 
just ignoring the field in the protocol (the data are fetched from the storage 
now, only the value of the field is incorrect if the size is bigger than 64k)
* the usage of collections for mixed cql3 rows (mixing static and dynamic 
content, i.e. mixing narrow-row and wide-row in underlying storage terminology).

We shall probably need to split the above described table (having 20 or so 
static columns and a hundreds of thousands elements long set) into two tables, 
one  for the static column set and other for the wide row. So instead of using:

{noformat}
CREATE TABLE test (
  id text PRIMARY KEY,
  val1 text,
  val2 int,
  val3 timestamp,
  valN text,
  some_set set<text>
)
{noformat}

we will have to have two tables:

{noformat}
CREATE TABLE test_narrow (
  id text PRIMARY KEY,
  val1 text,
  val2 int,
  val3 timestamp,
  valN text
)

CREATE TABLE test_wide (
  id text,
  val text,
  PRIMARY KEY (id, val)
)
{noformat}

The reason is not a modelling one (the first approach is much more comfortable 
and more compliant with the _denormalize everything_ approach), but performance 
one. The problem is cassandra always performs range query over all the columns 
of the underlying row if the table is not created with compact storage. So a 
query like {{select val1, val2 from test where id='some_key'}} performs poorly 
if the {{set}} in the table is big (~400 ms primary key lookup on a table 
having roughly 150k records on a row with a set with roughly 150k records on a 
2 CPU machine with enough memory and DB all mapped into RAM - no disk ops 
involved), even though we don't fetch the set in the select.

The question is: is this behaviour by design and is this the reason behind the 
recommendation not to use big collections?

I know and agree this is not the best place for modelling questions, but again 
- maybe this is useful for you as the designer of the feature to see how it is 
perceived by users and what issues we run into (by the way, we are new 
cassandra users and we started on cql3 from scratch - we are not thrift 
old-timers). I may take this whole topic to user list if you wish.

> Use 4 bytes to encode collection size in next native protocol version
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-6428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6428
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jan Chochol
>
> We are trying to use Cassandra CQL3 collections (sets and maps) for 
> denormalizing data.
> Problem is, when size of these collections go above some limit. We found that 
> current limitation is 64k - 1 (65535) items in collection.
> We found that there is inconsistency in CQL binary protocol (all current 
> available versions). 
> In protocol (for set) there are these fields:
> {noformat}
> [value size: int] [items count: short] [items] ...
> {noformat}
> One example in our case (collection with 65536 elements):
> {noformat}
> 00 21 ff ee 00 00 00 20 30 30 30 30 35 63 38 69 65 33 67 37 73 61 ...
> {noformat}
> So decode {{value size}} is 1245166 bytes and {{items count}} is 0.
> This is wrong - you can not have collection with 0 items occupying more than 
> 1MB.
> I understand that in unsigned short you can not have more than 65535, but I 
> do not understand why there is such limitation in protocol, when all data are 
> currently sent.
> In this case we have several possibilities:
> * ignore {{items count}} field and read all bytes specified in {{value size}}
> ** there is problem that we can not be sure, that this behaviour will be kept 
> over for future versions of Cassandra, as it is quite strange
> * refactor our code to use only small collections (this seems quite odd, as 
> Cassandra has no problems with wide rows)
> * do not use collections, and fall-back to net wide rows
> * wait for change in protocol for removing unnecessary limitation



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to