(Privately mailed, since I'm nervous about edging off-topic)

I'm working on some related areas, capturing conversation data from Twitter
at http://twitter.mailana.com/ . My approach has been the classic disk-space
trade off, creating massive indices to pre-cache queries. You're right
though, even with that approach the overhead of updating the denormalized
data of a complete friends-of-friends list for all users every time a link
changed would be enormous.

Pete

On Thu, Feb 26, 2009 at 4:39 PM, Nick Arnett <nick.arn...@gmail.com> wrote:

>
>
> On Thu, Feb 26, 2009 at 4:19 PM, Nick Arnett <nick.arn...@gmail.com>wrote:
>
>>
>> A relational database falls down very fast on this kind of analysis.  For
>> example, I have more than 300 followers, which is a simple query... but it
>> returns 300 users and now the query needs to ask who the followers of those
>> 300 are, to answer question No. 1.  That's a big, slow query, since it has
>> to specify the ids of the 300 that I follow... or it is 300 smaller queries.
>>  Either way, ugh.  That query is going to return a very large number of
>> items, many of which need to be compared with one another.
>>
>
> FYI, there are 345,000 nodes and 1.4 million edges in the graph of me, my
> followers and their followers.  I'm sure this could be pared down
> considerably by eliminating a handful of extremely popular people, but it's
> still a hard problem to scale.
>
> Nick
>

Reply via email to