On 27 March 2010 13:08, Carlo von Loesch <[email protected]> wrote: > I'm catching up with social-discuss and still see the four > projects described by elijah, although (2) the protocol and > (4) the local daemon are closely tied to each other. I have > some questions and comments on many topics that have been > discussed... PHP, P2P, DHT, XMPP etc. I'm presuming that > our grand objective would be to do something like Facebook > in a decentralized high scalability free software way - > and maybe even better, so it provides incentive for people > to enter the next dimension of social networking.
Thanks for this summary, Carlo. I agree with all of your points, with one exception: > A social network in a Facebook style generates events a go-go. > Each time a user adds a comment somewhere, each time a user likes > something, writes an update, joins a group or adds a friend. > Every time a notice needs to be distributed to all peers. > This is a one-to-many operation that hasn't got a ghost of a chance > of scaling if implemented as a round-robin series of HTTP calls. and in the same vein: > Several of these interfaces are HTTP-based and not fast enough > for inner federation real-time interactions. This will not scale. > But we can use them to gateway to external applications. I absolutely disagree that HTTP can't scale as a messaging protocol. I'll qualify that, because I've been on both sides of this fence[1]. When doing distribution of messages to 10,000 or 100,000 recipients, you want to be able to batch those deliveries (PSYC's notion of multicast) and more importantly you want low-latency, high reliability response characteristics. If each delivery of 10,000 takes even 1ms, the total delivery time is going to be 10 *seconds*. A more realistic number is the number of users a site has, the number of relationships each of those users have, and the number of actions that each user takes. With some random numbers, that could easily look like: 1000 users * 30 "friends" each * 30 actions per day = 900,000 requests per day. At 1ms per request, that's 900 seconds; totally doable with 86,400 seconds of CPU time per day (with a single machine). For what it's worth, Facebook's numbers (i.e., a reasonable upper limit, especially given that the whole point of this is to do something decentralized) probably look something like: 300,000,000 users * 30 friends each * 30 actions per day = 270 000 000 000 requests per day, or 3,125 days of CPU time worth of requests per day at 1ms per request (780 quad-core machines - honestly, not an unreasonable number for something as big as facebook). *MOST* HTTP servers are terribly ill-suited to this sort of messaging --- the sort that social networks promote. HTTP response times are normally more like 30ms if you're lucky, 1-2s if you're not. So in that regards, I agree, HTTP is ill-suited to this sort of work. *However*; it's entirely possible to build HTTP servers that are much faster than that. node.js currently serves as many as 12,000 requests per second on a single core; that sort of response rate would mean that our hypothetical Facebook would only need 260 days of CPU time worth of requests per day, down to ~65 quad-core machines. To run facebook. Using current, off-the-shelf open source technology, over HTTP. This is without taking into account the effects of batching/"multicast" (which would be extremely beneficial considering the scale of this hypothetical use case) or HTTP keep-alives, and so on. HTTPs stateless nature means that massive parallelism works out of the box - not something that can be said for e.g., XMPP or IRC. Most XMPP servers will fail to successfully negotiate an s2s link unless both servers are clustered at the back-end (thanks to dial-back), an approach which is far from scalable in most XMPP servers. I love XMPP, but don't ignore the fact that it's not a panacea, either – no protocol is. Neither does the above analysis take into account that in our ideal world, Facebook-like networks don't exist. Scale, therefore, isn't really an issue. As far as I can tell, Twitter ca. 2008 was a larger messaging system than any "Enterprise" corporation has, outside of stock trading systems. There's a good chance that Twitter and Facebook are now growing larger than even the largest stock trading systems, and they're doing it using HTTP. All this is not to say "Go out and build a giant PubSub network with Rails servers and PHP nodes" - that would be utter folly. On the other hand, this *is* to say that if you were to build a large-ish PubSub network comprised of PHP servers and Rails nodes, or vice versa, each installation of which was relatively small, it would work. Not only would it work, but technologies and approaches exist that would allow it to scale to whatever size without any fundamental problems at all. b. [1] http://blog.webhooks.org/2009/01/24/blaine-cook-does-like-web-hooks/
