On Fri, Jan 7, 2011 at 11:38 PM, Rajkumar Gupta <rajkumar....@gmail.com> wrote:
> In the twissandra example,
> http://www.riptano.com/docs/0.6/data_model/twissandra#adding-friends ,
> I find that they have split the materialized view of a user's homepage
> (like his followers list, tweets from friends) into several
> columnfamilies instead of putting in supercolumns inside a single
> SupercolumnFamily thereby making the rows skinnier, I was wandering as
> to which one will give better performance in terms of reads.
> I think skinnier will definitely have the advantage of less row
> mutations thus good read performance, when, only they, need to be
> retrieved, plus supercolumns of followerlist ,etc are avoided(this
> sounds good as supercolumn indexing limitations will not suck), but I
> still not pretty sure whether it would beneficial in terms of
> performance numbers, if I split the materialized view of single user
> into several columnfamilies instead of single row in single
> Supercolumnfamily.
>
>
>
>
>
> On Sat, Jan 8, 2011 at 2:05 AM, Rajkumar Gupta <rajkumar....@gmail.com> wrote:
>> The fact that subcolumns inside the supercolumns aren't indexed
>> currently may suck here, whenever a small no (10-20 ) of subcolumns
>> need to be retreived from a large list of subcolumns of a supercolumn
>> like MyPostsIdKeysList.
>>
>> On Fri, Jan 7, 2011 at 9:58 PM, Raj <rajkumar....@gmail.com> wrote:
>>> My question is in context of a social network schema design
>>>
>>> I am thinking of following schema for storing a user's data that is
>>> required as he logs in & is led to his homepage:-
>>> (I aimed at a schema design such that through a single row read query
>>> all the data that would be required to put up the homepage of that
>>> user, is retreived.)
>>>
>>> UserSuperColumnFamily: {    // Column Family
>>>
>>> UserIDKey:
>>> {columns:            MyName, MyEmail, MyCity,...etc
>>>  supercolumns:    MyFollowersList, MyFollowiesList, MyPostsIdKeysList,
>>> MyInterestsList, MyAlbumsIdKeysList, MyVideoIdKeysList,
>>> RecentNotificationsForUserList,  MessagesReceivedList,
>>> MessagesSentList, AccountSettingsList, RecentSelfActivityList,
>>> UpdatesFromFollowiesList
>>> }
>>> }
>>>
>>> Thus user's newfeed would be generated using superColumn:
>>> UpdatesFromFollowiesList. But the UpdatesFromFollowiesList, would
>>> obviously contain only Id of the posts and not the entire post data.
>>>
>>> Questions:
>>>
>>> 1.) What could be the problems with this design, any improvements ?
>>>
>>> 2.) Would frequent & heavy overwrite operations/ row mutations (for
>>> example; when propagating the post updates for news-feed from some
>>> user to all his followies) which leads to rows ultimately being in
>>> several SSTables, will lead to degraded read performance ?? Is it
>>> suitable to use row Cache(too big row but all data required uptil user
>>> is logged in) If I do not use cache, it may be very expensive to pull
>>> the row each time a data is required for the given user since row
>>> would be in several sstables. How can I improve the
>>> read performance here
>>>
>>> The actual data of the posts from network would be retrieved using
>>> PostIdKey through subsequent read queries from columnFamily
>>> PostsSuperColumnFamily which would be like follows:
>>>
>>> PostsSuperColumnFamily:{
>>>
>>> PostIdKey:
>>> {
>>> columns:            PostOwnerId, PostBody
>>> supercolumns:   TagsForPost {list of columns of all tags for the
>>> post}, PeopleWhoLikedThisPost {list of columns of UserIdKey of all the
>>> likers}
>>> }
>>> }
>>>
>>> Is this the best design to go with or are there any issues to consider
>>> here ? Thanks in anticipation of your valuable comments.!
>>>
>>
>

>From your description UserSuperColumnFamily it seems to be both a
Standard Column and a Super Column. You can not do that. However you
can encode things such as MyName MyCity and MyState into a 'UserInfo'
super Column column. UserInfo:MyState...

(as your mentioned) Super Columns are not indexed and have to be
completely de-serialized for each access. Because of this they are not
widely used for anything but small keys with a few columns. This also
applies to mutations as well, the row can exist in multiple SSTables
until it finally gets compacted. That can result in much more storage
used for an object that changes often.

Most designs use composite keys or using something like JSON encoded
values with Standard Column Families to achieve something like a Super
Column.

(SuperColumns are not always as Super as they seem :)

Reply via email to