We have a table with over 70M rows with a partition key that is unique.  We 
have a  created datetime stamp on each record, and we have a need to select all 
rows created for a date range. Secondary index is not an option as its high 
cardinality and could slow performance doing a full scan on 70M rows.


We are thinking of creating a new table with a date field as a clustering 
column to be able to query for date ranges, but partition key to clustering key 
will be 1-1. Is this a good approach?

To do this, we need to copy this large volume of data from table1 to table2 
within the same cluster, while updates are still happening to table1. We need 
to do this real time without impacting our customers. COPY is not an option, as 
we have ttl's on each row on table1 that need to be applied to table2 as well.


So what would be the best approach

  1.  To be able select data using date range without impacting performance. 
This operation will be needed only on adhoc basis and it wont be as frequent .
  2.  Best way to migrate large volume of data with ttl from one table to 
another within the same cluster.


Any other suggestions also will be greatly appreciated.

Reply via email to