My consideration is that whether doing so will result in better
memory/disk cache locality.
Suppose I need to query for 2 different types of rows for a frequent
user request, I can use 2 tables or 1 table:
2 tables:
create table t1(
partitionkey int primary key,
col1 int, col2 int, ...
)
create table t2(
partitionkey int primary key,
col3 int, col4 int, ...
)
query-2table:
select col1,col2 from t1 where partitionkey = ?
select col3,col4 from t1 where partitionkey = ?
1 table:
create table t(
partitionkey int,
rowtype tinyint,
col1 int, col2 int, ...
col3 int, col4 int, ...
primary key( partitionkey, rowtype )
)
query-1table-a:
select col1,col2 from t where partitionkey = ? and type = 1
select col3,col4 from t where partitionkey = ? and type = 2
or alternatively, query-1table-b:
select type,col1,col2,col3,col4 from t where partitionkey = ?
// switch on `type` in the app code
Is there significant performance difference in query-2table,
query-1table-a, query-1table-b?
Is the cassandra client/coordinator smart enough to direct subsequent
queries of the same (table, partitionkey) to the same node so they can
reuse a cached page?
Regards & Thanks