I know Cassandra is very flexible.
a. Because of super_column can not contain large number of columns, you
should not use design 1
b. Maybe with each query, you have to separate to each ColumnFamily

On Wed, Apr 21, 2010 at 1:17 PM, Steve Lihn <stevel...@gmail.com> wrote:

> Hi,
> I am new to Cassandra. I would like to use Cassandra to store financial
> data (time series). Have question on the data model design.
>
> The example here is the daily stock data. This would be a column family
> called dailyStockData. The raw key is stock ticker.
> Everyday there are attributes like closingPrice, volume, sharesOutstanding,
> etc. that need to be stored. There seems to be two ways to model it:
>
> Design 1: Each attribute is a super column. Therefore each date is a
> column. So we have:
>
> AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 }
> AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m }
> etc.
>
> Design 2: Each date is a super column. Therefore each attribute is a
> column. So we have:
>
> AAPL -> '2010-04-13' -> { closingPrice -> 242, volume -> 10.9m }
> AAPL -> '2010-04-14' -> {closingPrice -> 245, volume -> 14.4m }
> etc.
>
> The date column / superColumn will need Order Perserving Partitioner since
> we are going to do a lot of range queries. Examples are:
> Query 1: Give me the data between date1 and date2 for a set of tickers
> (say, the 100 tickers in QQQ).
> Query 2: More often than not, the query is: Give me the data for the max
> available dates (for each ticker) between date1 and date2 in a set of
> tickers.
> (Since not every day is traded, and we only want the most recent data,
> given a range of dates.)
>
> My questions are:
> a. Is there any technical reason to prefer (or must choose) one rather than
> the other between Design 1 and Design 2 ?
> b. Are both queries possible (and comparable in speed) for the chosen
> design ?
>
> Thanks,
> Steve
>
>
>
>
>
>
>


-- 
Best regards,
JKnight

Reply via email to