The extra parentheses are used to indicate that the three columns constitute 
the “partition key” – otherwise only the first column of the primary key would 
be the partition key. The partition key indicates which data rows will be 
stored contiguously on a single node of the cluster. As written, each of your 
rows might or might not get distributed to different nodes – each of your rows 
will have a distinct partition key. With Jens’ approach all rows with the same 
message_source_id would be part of the same partition (with the same partition 
key) and stored contiguously on the same node. Since you only have 30,000 rows, 
it probably doesn’t matter which way you go – organize your data based on how 
it is logically structured and how you wish to access it.

-- Jack Krupansky

From: Wim Deblauwe 
Sent: Tuesday, July 1, 2014 8:24 AM
To: user@cassandra.apache.org 
Subject: Re: Primary key question

Hi, 

thanks for the tip, but I never need to query the traffic_data_types and 
integration_periods for a single message_source, so I will keep the double 
bracket notation then for now.

Thanks,

Wim



2014-07-01 12:03 GMT+02:00 Jens Rantil <jens.ran...@tink.se>:

  Hi again, 

  As a follow-up; if you have many `message_source_id`s you could also do:

  CREATE TABLE integration_time (
  message_source_id uuid,
  traffic_data_type varchar,
  integration_period varchar,
  integration_time timestamp,
  PRIMARY KEY (message_source_id,traffic_data_type,integration_period)
  );

  This might enable you to easier be able to query all traffic_data_types and 
integration_periods for a single message_source_id without having to do a heavy 
query across all of your cluster. You'll have the same uniqueness property but 
this might, depending on your application, make things more debuggable. The 
flip side is that your cluster could be slightly more unbalanced if each 
message_source_id has a varied number of `integration_time`s.

  Just an idea,
  Jens



  On Tue, Jul 1, 2014 at 8:37 AM, Wim Deblauwe <wim.debla...@gmail.com> wrote:

    Hi, 

    I have the following table:

    CREATE TABLE integration_time (
    message_source_id uuid,
    traffic_data_type varchar,
    integration_period varchar,
    integration_time timestamp,
    PRIMARY KEY ((message_source_id,traffic_data_type,integration_period))
    );

    I want the combination of (message_source_id, traffic_data_type, 
integration_period) to be unique. Is this the correct way to do it (with the 
double brackets) ?

    This table will be relative small, it just stores the last time something 
was done in the application for that unique combination of those 3 parameters. 
Worst case there will be 30000 rows in that table and they will always be 
fetched by quering on the 3 parameters at the same time.

    regards,

    Wim

Reply via email to