Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Aditya Narayan Wed, 02 Feb 2011 13:28:15 -0800

Can I have some more feedback about my schema perhaps somewhat more
criticisive/harsh ?



Thanks again,
Aditya Narayan

On Wed, Feb 2, 2011 at 10:27 PM, Aditya Narayan <ady...@gmail.com> wrote:
> @Bill
> Thank you BIll!
>
> @Cassandra users
> Can others also leave their suggestions and comments about my schema, please.
> Also my question about whether to use a superColumn or alternatively,
> just store the data (that would otherwise be stored in subcolumns) as
> serialized into a single column in standard type column family.
>
> Thanks
>
> -Aditya Narayan
>
>
>
> On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs <bill.spe...@gmail.com> 
> wrote:
>> I did not understand before... sorry.
>>
>> Again, depending upon how many reminders you have for a single user, this
>> could be a long/wide row. Again, it really comes down to how many reminders
>> are we talking about and how often will they be read/written. While a single
>> row can contain millions (maybe more) columns, that doesn't mean it's a good
>> idea.
>>
>> I'm working on a logging system with Cassandra and ran into this same type
>> of problem. Do I put all of the messages for a single system into a single
>> row keyed off that system's name? I quickly came to the answer of "no" and
>> now I break my row keys into POSIX_timestamp:system where my timestamps are
>> buckets for every 5 minutes. This nicely distributes the load across the
>> nodes in my system.
>>
>> Bill-
>>
>> On 02/02/2011 11:18 AM, Aditya Narayan wrote:
>>>
>>> You got me wrong perhaps..
>>>
>>> I am already splitting the row on per user basis ofcourse, otherwise
>>> the schema wont make sense for my usage. The row contains only
>>> *reminders of a single user* sorted in chronological order. The
>>> reminder Id are stored as supercolumn name and subcolumn contain tags
>>> for that reminder.
>>>
>>>
>>>
>>> On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs<bill.spe...@gmail.com>
>>>  wrote:
>>>>
>>>> Any time I see/hear "a single row containing all ..." I get nervous. That
>>>> single row is going to reside on a single node. That is potentially a lot
>>>> of
>>>> load (don't know the system) for that single node. Why wouldn't you split
>>>> it
>>>> by at least user? If it won't be a lot of load, then why are you using
>>>> Cassandra? This seems like something that could easily fit into an
>>>> SQL/relational style DB. If it's too much data (millions of users, 100s
>>>> of
>>>> millions of reminders) for a standard SQL/relational model, then it's
>>>> probably too much for a single row.
>>>>
>>>> I'm not familiar with the TTL functionality of Cassandra... sorry cannot
>>>> help/comment there, still learning :-)
>>>>
>>>> Yea, my $0.02 is that this is an effective way to leverage super columns.
>>>>
>>>> Bill-
>>>>
>>>> On 02/02/2011 10:43 AM, Aditya Narayan wrote:
>>>>>
>>>>> I think you got it exactly what I wanted to convey except for few
>>>>> things I want to clarify:
>>>>>
>>>>> I was thinking of a single row containing all reminders (&    not split
>>>>> by day). History of the reminders need to be maintained for some time.
>>>>> After certain time (say 3 or 6 months) they may be deleted by ttl
>>>>> facility.
>>>>>
>>>>> "While presenting the reminders timeline to the user, latest
>>>>> supercolumns like around 50 from the start_end will be picked up and
>>>>> their subcolumns values will be compared to the Tags user has chosen
>>>>> to see and, corresponding to the filtered subcolumn values(tags), the
>>>>> rows of the reminder details would be picked up.."
>>>>>
>>>>> Is supercolumn a preferable choice for this ? Can there be a better
>>>>> schema than this ?
>>>>>
>>>>>
>>>>> -Aditya Narayan
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<bill.spe...@gmail.com>
>>>>>  wrote:
>>>>>>
>>>>>> To reiterate, so I know we're both on the same page, your schema would
>>>>>> be
>>>>>> something like this:
>>>>>>
>>>>>> - A column family (as you describe) to store the details of a reminder.
>>>>>> One
>>>>>> reminder per row. The row key would be a TimeUUID.
>>>>>>
>>>>>> - A super column family to store the reminders for each user, for each
>>>>>> day.
>>>>>> The row key would be something like: YYYYMMDD:user_id. The column names
>>>>>> would simply be the TimeUUID of the messages. The sub column names
>>>>>> would
>>>>>> be
>>>>>> the tag names of the various reminders.
>>>>>>
>>>>>> The idea is that you would then get a slice of each row for a user, for
>>>>>> a
>>>>>> day, that would only contain sub column names with the tags you're
>>>>>> looking
>>>>>> for? Then based upon the column names returned, you'd look-up the
>>>>>> reminders.
>>>>>>
>>>>>> That seems like a solid schema to me.
>>>>>>
>>>>>> Bill-
>>>>>>
>>>>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote:
>>>>>>>
>>>>>>> Actually, I am trying to use Cassandra to display to users on my
>>>>>>> applicaiton, the list of all Reminders set by themselves for
>>>>>>> themselves, on the application.
>>>>>>>
>>>>>>> I need to store rows containing the timeline of daily Reminders put by
>>>>>>> the users, for themselves, on application. The reminders need to be
>>>>>>> presented to the user in a chronological order like a news feed.
>>>>>>> Each reminder has got certain tags associated with it(so that, at
>>>>>>> times, user may also choose to see the reminders filtered by tags in
>>>>>>> chronological order).
>>>>>>>
>>>>>>> So I thought of a schema something like this:-
>>>>>>>
>>>>>>> -Each Reminder details may be stored as separate rows in column
>>>>>>> family.
>>>>>>> -For presenting the timeline of reminders set by user to be presented
>>>>>>> to the user, the timeline row of each user would contain the Id/Key(s)
>>>>>>> (of the Reminder rows) as the supercolumn names and the subcolumns
>>>>>>> inside that supercolumns could contain the list of tags associated
>>>>>>> with particular reminder. All tags set at once during first write. The
>>>>>>> no of tags(subcolumns) will be around 8 maximum.
>>>>>>>
>>>>>>> Any comments, suggestions and feedback on the schema design are
>>>>>>> requested..
>>>>>>>
>>>>>>> Thanks
>>>>>>> Aditya Narayan
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<ady...@gmail.com>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> Hey all,
>>>>>>>>
>>>>>>>> I need to store supercolumns each with around 8 subcolumns;
>>>>>>>> All the data for a supercolumn is written at once and all subcolumns
>>>>>>>> need to be retrieved together. The data in each subcolumn is not big,
>>>>>>>> it just contains keys to other rows.
>>>>>>>>
>>>>>>>> Would it be preferred to have a supercolumn family or just a standard
>>>>>>>> column family containing "all the subcolumns data serialized in
>>>>>>>> single
>>>>>>>> column(s) " ?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Aditya Narayan
>>>>>>>>
>>>>>>
>>>>
>>
>

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

Reply via email to