My consideration is that whether doing so will result in better memory/disk cache locality.

Suppose I need to query for 2 different types of rows for a frequent user request, I can use 2 tables or 1 table:

2 tables:

  create table t1(
    partitionkey int primary key,
    col1 int, col2 int, ...
  )
  create table t2(
    partitionkey int primary key,
    col3 int, col4 int, ...
  )

query-2table:
  select col1,col2 from t1 where partitionkey = ?
  select col3,col4 from t1 where partitionkey = ?

1 table:

  create table t(
    partitionkey int,
    rowtype tinyint,
    col1 int, col2 int, ...
    col3 int, col4 int, ...
    primary key( partitionkey, rowtype )
  )

query-1table-a:
  select col1,col2 from t where partitionkey = ? and type = 1
  select col3,col4 from t where partitionkey = ? and type = 2

or alternatively, query-1table-b:
  select type,col1,col2,col3,col4 from t where partitionkey = ?
  // switch on `type` in the app code

Is there significant performance difference in query-2table, query-1table-a, query-1table-b? Is the cassandra client/coordinator smart enough to direct subsequent queries of the same (table, partitionkey) to the same node so they can reuse a cached page?

Regards & Thanks

Reply via email to