Can I have some more feedback about my schema perhaps somewhat more criticisive/harsh ?
Thanks again, Aditya Narayan On Wed, Feb 2, 2011 at 10:27 PM, Aditya Narayan <ady...@gmail.com> wrote: > @Bill > Thank you BIll! > > @Cassandra users > Can others also leave their suggestions and comments about my schema, please. > Also my question about whether to use a superColumn or alternatively, > just store the data (that would otherwise be stored in subcolumns) as > serialized into a single column in standard type column family. > > Thanks > > -Aditya Narayan > > > > On Wed, Feb 2, 2011 at 10:11 PM, William R Speirs <bill.spe...@gmail.com> > wrote: >> I did not understand before... sorry. >> >> Again, depending upon how many reminders you have for a single user, this >> could be a long/wide row. Again, it really comes down to how many reminders >> are we talking about and how often will they be read/written. While a single >> row can contain millions (maybe more) columns, that doesn't mean it's a good >> idea. >> >> I'm working on a logging system with Cassandra and ran into this same type >> of problem. Do I put all of the messages for a single system into a single >> row keyed off that system's name? I quickly came to the answer of "no" and >> now I break my row keys into POSIX_timestamp:system where my timestamps are >> buckets for every 5 minutes. This nicely distributes the load across the >> nodes in my system. >> >> Bill- >> >> On 02/02/2011 11:18 AM, Aditya Narayan wrote: >>> >>> You got me wrong perhaps.. >>> >>> I am already splitting the row on per user basis ofcourse, otherwise >>> the schema wont make sense for my usage. The row contains only >>> *reminders of a single user* sorted in chronological order. The >>> reminder Id are stored as supercolumn name and subcolumn contain tags >>> for that reminder. >>> >>> >>> >>> On Wed, Feb 2, 2011 at 9:19 PM, William R Speirs<bill.spe...@gmail.com> >>> wrote: >>>> >>>> Any time I see/hear "a single row containing all ..." I get nervous. That >>>> single row is going to reside on a single node. That is potentially a lot >>>> of >>>> load (don't know the system) for that single node. Why wouldn't you split >>>> it >>>> by at least user? If it won't be a lot of load, then why are you using >>>> Cassandra? This seems like something that could easily fit into an >>>> SQL/relational style DB. If it's too much data (millions of users, 100s >>>> of >>>> millions of reminders) for a standard SQL/relational model, then it's >>>> probably too much for a single row. >>>> >>>> I'm not familiar with the TTL functionality of Cassandra... sorry cannot >>>> help/comment there, still learning :-) >>>> >>>> Yea, my $0.02 is that this is an effective way to leverage super columns. >>>> >>>> Bill- >>>> >>>> On 02/02/2011 10:43 AM, Aditya Narayan wrote: >>>>> >>>>> I think you got it exactly what I wanted to convey except for few >>>>> things I want to clarify: >>>>> >>>>> I was thinking of a single row containing all reminders (& not split >>>>> by day). History of the reminders need to be maintained for some time. >>>>> After certain time (say 3 or 6 months) they may be deleted by ttl >>>>> facility. >>>>> >>>>> "While presenting the reminders timeline to the user, latest >>>>> supercolumns like around 50 from the start_end will be picked up and >>>>> their subcolumns values will be compared to the Tags user has chosen >>>>> to see and, corresponding to the filtered subcolumn values(tags), the >>>>> rows of the reminder details would be picked up.." >>>>> >>>>> Is supercolumn a preferable choice for this ? Can there be a better >>>>> schema than this ? >>>>> >>>>> >>>>> -Aditya Narayan >>>>> >>>>> >>>>> >>>>> On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs<bill.spe...@gmail.com> >>>>> wrote: >>>>>> >>>>>> To reiterate, so I know we're both on the same page, your schema would >>>>>> be >>>>>> something like this: >>>>>> >>>>>> - A column family (as you describe) to store the details of a reminder. >>>>>> One >>>>>> reminder per row. The row key would be a TimeUUID. >>>>>> >>>>>> - A super column family to store the reminders for each user, for each >>>>>> day. >>>>>> The row key would be something like: YYYYMMDD:user_id. The column names >>>>>> would simply be the TimeUUID of the messages. The sub column names >>>>>> would >>>>>> be >>>>>> the tag names of the various reminders. >>>>>> >>>>>> The idea is that you would then get a slice of each row for a user, for >>>>>> a >>>>>> day, that would only contain sub column names with the tags you're >>>>>> looking >>>>>> for? Then based upon the column names returned, you'd look-up the >>>>>> reminders. >>>>>> >>>>>> That seems like a solid schema to me. >>>>>> >>>>>> Bill- >>>>>> >>>>>> On 02/02/2011 09:37 AM, Aditya Narayan wrote: >>>>>>> >>>>>>> Actually, I am trying to use Cassandra to display to users on my >>>>>>> applicaiton, the list of all Reminders set by themselves for >>>>>>> themselves, on the application. >>>>>>> >>>>>>> I need to store rows containing the timeline of daily Reminders put by >>>>>>> the users, for themselves, on application. The reminders need to be >>>>>>> presented to the user in a chronological order like a news feed. >>>>>>> Each reminder has got certain tags associated with it(so that, at >>>>>>> times, user may also choose to see the reminders filtered by tags in >>>>>>> chronological order). >>>>>>> >>>>>>> So I thought of a schema something like this:- >>>>>>> >>>>>>> -Each Reminder details may be stored as separate rows in column >>>>>>> family. >>>>>>> -For presenting the timeline of reminders set by user to be presented >>>>>>> to the user, the timeline row of each user would contain the Id/Key(s) >>>>>>> (of the Reminder rows) as the supercolumn names and the subcolumns >>>>>>> inside that supercolumns could contain the list of tags associated >>>>>>> with particular reminder. All tags set at once during first write. The >>>>>>> no of tags(subcolumns) will be around 8 maximum. >>>>>>> >>>>>>> Any comments, suggestions and feedback on the schema design are >>>>>>> requested.. >>>>>>> >>>>>>> Thanks >>>>>>> Aditya Narayan >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan<ady...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hey all, >>>>>>>> >>>>>>>> I need to store supercolumns each with around 8 subcolumns; >>>>>>>> All the data for a supercolumn is written at once and all subcolumns >>>>>>>> need to be retrieved together. The data in each subcolumn is not big, >>>>>>>> it just contains keys to other rows. >>>>>>>> >>>>>>>> Would it be preferred to have a supercolumn family or just a standard >>>>>>>> column family containing "all the subcolumns data serialized in >>>>>>>> single >>>>>>>> column(s) " ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Aditya Narayan >>>>>>>> >>>>>> >>>> >> >