Hello everyone We are currently evaluating a new DB system (replacing MySQL) to store massive amounts of time-series data. The data are various metrics from various network and IT devices and systems. Metrics i.e. could be CPU usage of the server "xy" in percent, memory usage of server "xy" in MB, ping response time of server "foo" in milliseconds, network traffic of router "bar" in MB/s and so on. Different metrics can be collected for different devices in different intervals.
The metrics are stored together with a timestamp. The queries we want to perform are: * The last value of a specific metric of a device * The values of a specific metric of a device between two timestamps t1 and t2 I stumbled across this blog post which describes a very similar setup with Cassandra: https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ This post gave me confidence that what we want is definitively doable with Cassandra. But since I'm just digging into columns and super-columns and their families, I still have some problems understanding everything. Our data model could look in json'isch notation like this: { "my_server_1": { "cpu_usage": { {ts: 1271248215, value: 87 }, {ts: 1271248220, value: 34 }, {ts: 1271248225, value: 23 }, {ts: 1271248230, value: 49 } } "ping_response": { {ts: 1271248201, value: 0.345 }, {ts: 1271248211, value: 0.423 }, {ts: 1271248221, value: 0.311 }, {ts: 1271248232, value: 0.582 } } } "my_server_2": { "cpu_usage": { {ts: 1271248215, value: 23 }, ... } "disk_usage": { {ts: 1271243451, value: 123445 }, ... } } "my_router_1": { "bytes_in": { {ts: 1271243451, value: 2452346 }, ... } "bytes_out": { {ts: 1271243451, value: 13468 }, ... } "errors": { {ts: 1271243451, value: 24 }, ... } } } What I don't get is how to created the two level hierarchy [device][metric]. Am I right that the devices would be kept in a super column family? The ordering of those is not important. But the metrics per device are also a super column, where the columns would be the metric values ({ts: 1271243451, value: 24 }), isn't it? So I'd need a super column in a super column... Hm. My brain is definitively RDBMS-damaged and I don't see through columns and super-columns yet. :-) How could this be modeled in Cassandra? Thank you very much James