The extra parentheses are used to indicate that the three columns constitute the “partition key” – otherwise only the first column of the primary key would be the partition key. The partition key indicates which data rows will be stored contiguously on a single node of the cluster. As written, each of your rows might or might not get distributed to different nodes – each of your rows will have a distinct partition key. With Jens’ approach all rows with the same message_source_id would be part of the same partition (with the same partition key) and stored contiguously on the same node. Since you only have 30,000 rows, it probably doesn’t matter which way you go – organize your data based on how it is logically structured and how you wish to access it.
-- Jack Krupansky From: Wim Deblauwe Sent: Tuesday, July 1, 2014 8:24 AM To: user@cassandra.apache.org Subject: Re: Primary key question Hi, thanks for the tip, but I never need to query the traffic_data_types and integration_periods for a single message_source, so I will keep the double bracket notation then for now. Thanks, Wim 2014-07-01 12:03 GMT+02:00 Jens Rantil <jens.ran...@tink.se>: Hi again, As a follow-up; if you have many `message_source_id`s you could also do: CREATE TABLE integration_time ( message_source_id uuid, traffic_data_type varchar, integration_period varchar, integration_time timestamp, PRIMARY KEY (message_source_id,traffic_data_type,integration_period) ); This might enable you to easier be able to query all traffic_data_types and integration_periods for a single message_source_id without having to do a heavy query across all of your cluster. You'll have the same uniqueness property but this might, depending on your application, make things more debuggable. The flip side is that your cluster could be slightly more unbalanced if each message_source_id has a varied number of `integration_time`s. Just an idea, Jens On Tue, Jul 1, 2014 at 8:37 AM, Wim Deblauwe <wim.debla...@gmail.com> wrote: Hi, I have the following table: CREATE TABLE integration_time ( message_source_id uuid, traffic_data_type varchar, integration_period varchar, integration_time timestamp, PRIMARY KEY ((message_source_id,traffic_data_type,integration_period)) ); I want the combination of (message_source_id, traffic_data_type, integration_period) to be unique. Is this the correct way to do it (with the double brackets) ? This table will be relative small, it just stores the last time something was done in the application for that unique combination of those 3 parameters. Worst case there will be 30000 rows in that table and they will always be fetched by quering on the 3 parameters at the same time. regards, Wim