Hi Cass,

just a hint from the off - if I got it right you have:

Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)

Assuming your events to write come in by wall clock time, the first table design will have a hotspot on a specific node getting all writes for a single hour as (event_day,event_hr) is the partioning key. The second table design will put this hotspot on a specific node per day as event_day is the partitoning key. So please be careful if you have a write intensive workload.

I have designed my logging tables with a non datetime key in my partioning key to distribute writes to all nodes at a specific point in time. I have for example

PRIMARY KEY ((sensor_id,measure_date))

and the timestamp-value pairs in the rows. They are quite wide as I have about 10000 measurements per sensor and id, but analytics and cleanup jobs run daily.

Of course as a "not so long time" cassandra user I can be wrong, please feel free to correct me.

Cheers,
Roland


Reply via email to