I like to model this kind of data as columns, where the timestamps are the column name (either longs, TimeUUIDs, or string depending on your usage). If you have too much data for a single row, you'd need to have multiple rows of these. For time-series data, it makes sense to use one row per minute/hour/day/year depending on the volume of your data.
Something like the following: SomeTimeData: { // columnfamily "20100601": { // key, yyyymmdd 123456789: "value1", // column name is milliseconds since epoch 123456799: "value2" }, "20100602": { 12345889: "value3" } } Now you can use column slices to retrieve all values between two time periods on a given day. If you need to support larger ranges you'll either have to slice columns from multiple keys or change the keys from yyyymmdd to yyyymm, yyyy, etc. There's a tradeoff here between row width and read speed. Reading 1000 columns as a continuous slice from a single row will be very fast but reading 1000 columns as slices from 10 keys won't be as fast. Ben On Wed, Jun 2, 2010 at 11:32 AM, David Boxenhorn <da...@lookin2.com> wrote: > How do I handle giant sets of ordered data, e.g. by timestamps, which I want > to access by range? > > I can't put all the data into a supercolumn, because it's loaded into memory > at once, and it's too much data. > > Am I forced to use an order-preserving partitioner? I don't want the > headache. Is there any other way? >