Re: Design for 'Most viewed Discussions' in a forum

openvictor Open Wed, 18 May 2011 11:59:30 -0700

I guess you can use the same system, you need two CF for that and I think
it's better to use 0.8 because it supports counter :


One CF with UTF8Type called active-topics one CF with UUIDType called
topics-seen, then using the same principle :

for each timestampN you create :

For each visit to Topic1 Topic2 Topic1

You create a TimeUUID and you insert
active-topics[topics:timestampN] = {Topic1:whateveryouwant}
and :
topics-seen[topic:Topic1]={TimeUUID1:whatever}


active-topics[topics:timestampN] = {Topic2:whateveryouwant}
and :
topics-seen[topic:Topic2]={TimeUUID2:whatever}


active-topics[topics:timestampN] = {Topic1:whateveryouwant}
and :
topics-seen[topic:Topic1]={TimeUUID3:whatever}


Then when you want to query, you query first all the topics (slice) in
active-topics for topics:timestampN and then you get all counts in the
topics-seen CF for all topics in active-topics.

Not so simple... By the way it adds overhead compared to a simple counter
solution but I think it is far more elegant, but this is just my opinion.

Victor


2011/5/18 Aditya Narayan <ady...@gmail.com>

> Thanks victor!
>
> Aren't there any good ways by using Cassandra alone ?
>
>
> On Wed, May 18, 2011 at 11:41 PM, openvictor Open <openvic...@gmail.com>wrote:
>
>> Have you thought about user another kind of Database, which supports
>> volative content for example ?
>>
>> I am currently thinking about doing something similar. The best and
>> simplest option at the moment that I can think of is Redis. In redis you
>> have the option of querying keys with wildcards. Your problem can be done by
>> just inserting an UUID into Redis for a certain amount of time ( the best is
>> to tailor this amount of time as an inverse function of the number of keys
>> existing in Redis).
>>
>> *With Redis*
>> What I would do : I cut down time in pieces of X minutes ( 15 minutes, for
>> example by truncating a timestamp). Let timestampN be the timestamp for the
>> period of time ( [N,N+15] ), let Topic1 Topic2 be two topics then :
>>
>> One or more people will view Topic 1 then Topic2 then again Topic1 in this
>> period of 15 minutes
>> (HINCRBY is the Increment)
>> H <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>> topics:Topic1:timestampN
>> viewcount 1
>> H <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>> topics:Topic2:timestampN
>> viewcount 1
>> H <http://redis.io/commands/hincrby>INCRBY<http://redis.io/commands/hincrby> 
>> topics:Topic1:timestampN
>> viewcount 1
>>
>> Then you just query in the following way :
>>
>> MGET <http://redis.io/commands/mget> topics:*:timestampN
>>
>> * is the wildcard, you order by viewcount and you have what you are asking
>> for !
>> This is a simplified version of what you should do but personnally I
>> really like the combination of Cassandra and Redis.
>>
>>
>> Victor
>>
>> 2011/5/18 Aditya Narayan <ady...@gmail.com>
>>
>>> I would arrange for memtable flush period in such a manner that the time
>>> period for which these most viewed discussions are generated equals the
>>> memtable flush timeperiod, so that the entire row of most viewed discussion
>>> on a topic is in one or maximum two memtables/ SST tables.
>>> This would also help minimize several versions of the same column in the
>>> row parts in different SST tables.
>>>
>>>
>>>
>>> On Wed, May 18, 2011 at 11:04 PM, Aditya Narayan <ady...@gmail.com>wrote:
>>>
>>>> *************
>>>> For a discussions forum, I need to show a page of most viewed
>>>> discussions.
>>>>
>>>> For implementing this, I maintain a count of views of a discussion &
>>>> when this views count of a discussion passes a certain threshold limit, the
>>>> discussion Id is added to a row of most viewed discussions.
>>>>
>>>> This row of most viewed discussions contains columns with Integer names
>>>> & values containing serialized lists of Ids of all discussions whose views
>>>> count equals the Integral name of this column.
>>>>
>>>> Thus if the view count of a discussion increases I'll need to move its
>>>> 'Id' from serialized list in some column to serialized list in another
>>>> column whose name represents the updated views count on that discussion.
>>>>
>>>> Thus I can get the most viewed discussions by getting the appropriate no
>>>> of columns from one end of this Integer sorted row.
>>>>
>>>> ************
>>>>
>>>> I wanted to get feedback from you all, to know if this is a good design.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Design for 'Most viewed Discussions' in a forum

Reply via email to