Hi Cass,
just a hint from the off - if I got it right you have:
Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)
Assuming your events to write come in by wall clock time, the first
table design will have a hotspot on a specific node getting all writes
for a single hour as (event_day,event_hr) is the partioning key. The
second table design will put this hotspot on a specific node per day as
event_day is the partitoning key. So please be careful if you have a
write intensive workload.
I have designed my logging tables with a non datetime key in my
partioning key to distribute writes to all nodes at a specific point in
time. I have for example
PRIMARY KEY ((sensor_id,measure_date))
and the timestamp-value pairs in the rows. They are quite wide as I have
about 10000 measurements per sensor and id, but analytics and cleanup
jobs run daily.
Of course as a "not so long time" cassandra user I can be wrong, please
feel free to correct me.
Cheers,
Roland