Time-series data model

Jean-Pierre Bergamin Wed, 14 Apr 2010 06:03:03 -0700

Hello everyone

We are currently evaluating a new DB system (replacing MySQL) to store
massive amounts of time-series data. The data are various metrics from
various network and IT devices and systems. Metrics i.e. could be CPU usage
of the server "xy" in percent, memory usage of server "xy" in MB, ping
response time of server "foo" in milliseconds, network traffic of router
"bar" in MB/s and so on. Different metrics can be collected for different
devices in different intervals.


The metrics are stored together with a timestamp. The queries we want to
perform are:
 * The last value of a specific metric of a device
 * The values of a specific metric of a device between two timestamps t1 and
t2

I stumbled across this blog post which describes a very similar setup with
Cassandra:
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
This post gave me confidence that what we want is definitively doable with
Cassandra.

But since I'm just digging into columns and super-columns and their
families, I still have some problems understanding everything.

Our data model could look in json'isch notation like this:
{
"my_server_1": {
        "cpu_usage": {
                {ts: 1271248215, value: 87 },
                {ts: 1271248220, value: 34 },
                {ts: 1271248225, value: 23 },
                {ts: 1271248230, value: 49 }
        }
        "ping_response": {
                {ts: 1271248201, value: 0.345 },
                {ts: 1271248211, value: 0.423 },
                {ts: 1271248221, value: 0.311 },
                {ts: 1271248232, value: 0.582 }
        }
}

"my_server_2": {
        "cpu_usage": {
                {ts: 1271248215, value: 23 },
                ...
        }
        "disk_usage": {
                {ts: 1271243451, value: 123445 },
                ...
        }
}

"my_router_1": {
        "bytes_in": {
                {ts: 1271243451, value: 2452346 },
                ...
        }
        "bytes_out": {
                {ts: 1271243451, value: 13468 },
                ...
        }
        "errors": {
                {ts: 1271243451, value: 24 },
                ...
        }
}
}

What I don't get is how to created the two level hierarchy [device][metric].

Am I right that the devices would be kept in a super column family? The
ordering of those is not important.

But the metrics per device are also a super column, where the columns would
be the metric values ({ts: 1271243451, value: 24 }), isn't it?

So I'd need a super column in a super column... Hm.
My brain is definitively RDBMS-damaged and I don't see through columns and
super-columns yet. :-)

How could this be modeled in Cassandra?


Thank you very much
James

Time-series data model

Reply via email to