Re: Data tiered compaction and data model question

Roland Etzenhammer Thu, 19 Feb 2015 23:20:27 -0800

Hi Cass,

just a hint from the off - if I got it right you have:


Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)

Assuming your events to write come in by wall clock time, the firsttable design will have a hotspot on a specific node getting all writesfor a single hour as (event_day,event_hr) is the partioning key. Thesecond table design will put this hotspot on a specific node per day asevent_day is the partitoning key. So please be careful if you have awrite intensive workload.

I have designed my logging tables with a non datetime key in mypartioning key to distribute writes to all nodes at a specific point intime. I have for example


PRIMARY KEY ((sensor_id,measure_date))

and the timestamp-value pairs in the rows. They are quite wide as I haveabout 10000 measurements per sensor and id, but analytics and cleanupjobs run daily.

Of course as a "not so long time" cassandra user I can be wrong, pleasefeel free to correct me.


Cheers,
Roland

Re: Data tiered compaction and data model question

Reply via email to