(Privately mailed, since I'm nervous about edging off-topic) I'm working on some related areas, capturing conversation data from Twitter at http://twitter.mailana.com/ . My approach has been the classic disk-space trade off, creating massive indices to pre-cache queries. You're right though, even with that approach the overhead of updating the denormalized data of a complete friends-of-friends list for all users every time a link changed would be enormous.
Pete On Thu, Feb 26, 2009 at 4:39 PM, Nick Arnett <nick.arn...@gmail.com> wrote: > > > On Thu, Feb 26, 2009 at 4:19 PM, Nick Arnett <nick.arn...@gmail.com>wrote: > >> >> A relational database falls down very fast on this kind of analysis. For >> example, I have more than 300 followers, which is a simple query... but it >> returns 300 users and now the query needs to ask who the followers of those >> 300 are, to answer question No. 1. That's a big, slow query, since it has >> to specify the ids of the 300 that I follow... or it is 300 smaller queries. >> Either way, ugh. That query is going to return a very large number of >> items, many of which need to be compared with one another. >> > > FYI, there are 345,000 nodes and 1.4 million edges in the graph of me, my > followers and their followers. I'm sure this could be pared down > considerably by eliminating a handful of extremely popular people, but it's > still a hard problem to scale. > > Nick >