[ 
https://issues.apache.org/jira/browse/CASSANDRA-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116170#comment-14116170
 ] 

Drew Kutcharian edited comment on CASSANDRA-7850 at 8/30/14 2:00 AM:
---------------------------------------------------------------------

Yes, but then I might end up with very wide _thrift_ rows.

Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}} 
where records with same block_id get stored on the same node *regardless* of 
the value of breed_bucket. But I don't want to use {{PRIMARY KEY (block_id, 
breed_bucket, breed)}} since in that case all the records for a block_id would 
end up in a single _thrift_ row.

So, ideally the layout would be:
block_id -> decides the node
(block_id, breed_bucket) -> decides the _thrift_ row. Old school "row key"
breed -> prefix of _thrift_ columns. Old school "column name prefix"



was (Author: drew_kutchar):
Yes, but then I might end up with very wide rows.

Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}} 
where records with same block_id and breed_bucket get stored on the same node, 
but in different _thrift_ rows so I don't have very wide rows (millions of 
_thrift_ columns per _thrift_ row). 

> Composite Aware Partitioner
> ---------------------------
>
>                 Key: CASSANDRA-7850
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7850
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Drew Kutcharian
>
> Since C* supports composites for partition keys, I think it'd be useful to 
> have the ability to only use first (or first few) components of the key to 
> calculate the token hash.
> A naive use case would be multi-tenancy:
> Say we have accounts and accounts have users. So we would have the following 
> tables:
> {code}
> CREATE TABLE account (
>   id                     timeuuid PRIMARY KEY,
>   company         text
> );
> {code}
> {code}
> CREATE TABLE user (
>   id              timeuuid PRIMARY KEY, 
>   accountId timeuuid,
>   email        text,
>   password text
> );
> {code}
> {code}
> // Get users by account
> CREATE TABLE user_account_index (
>   accountId  timeuuid,
>   userId        timeuuid,
>   PRIMARY KEY(acid, id)
> );
> {code}
> Say we want to get all the users that belong to an account. We would first 
> have to get the results from user_account_index and then use a multi-get 
> (WHERE IN) to get the records from user table. Now this multi-get part could 
> potentially query a lot of different nodes in the cluster. It’d be great if 
> there was a way to limit storage of users of an account to a single node so 
> that way multi-get would only need to query a single node.
> With this improvement we would be able to define the user table like so:
> {code}
> CREATE TABLE user (
>   id              timeuuid, 
>   accountId timeuuid,
>   email        text,
>   password text,
>   PRIMARY KEY(((accountId),id))  //extra parentheses
> );
> {code}
> I'm not too sure about the notation, it could be something like PRIMARY 
> KEY(((accountId),id)) where the "(accountId)" means use this part to 
> calculate the hash and ((accountId),id) is the actual partition key.
> The main complication I see with this is that we would have to use the table 
> definition when calculating hashes so we know what components of the 
> partition keys need to be used for hash calculation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to