Re: [Social-discuss] Which of four projects are we doing?

Blaine Cook Sat, 27 Mar 2010 09:53:07 -0700

On 27 March 2010 13:08, Carlo von Loesch <[email protected]> wrote:
> I'm catching up with social-discuss and still see the four
> projects described by elijah, although (2) the protocol and
> (4) the local daemon are closely tied to each other. I have
> some questions and comments on many topics that have been
> discussed... PHP, P2P, DHT, XMPP etc. I'm presuming that
> our grand objective would be to do something like Facebook
> in a decentralized high scalability free software way -
> and maybe even better, so it provides incentive for people
> to enter the next dimension of social networking.


Thanks for this summary, Carlo. I agree with all of your points, with
one exception:

> A social network in a Facebook style generates events a go-go.
> Each time a user adds a comment somewhere, each time a user likes
> something, writes an update, joins a group or adds a friend.
> Every time a notice needs to be distributed to all peers.
> This is a one-to-many operation that hasn't got a ghost of a chance
> of scaling if implemented as a round-robin series of HTTP calls.

and in the same vein:

> Several of these interfaces are HTTP-based and not fast enough
> for inner federation real-time interactions. This will not scale.
> But we can use them to gateway to external applications.

I absolutely disagree that HTTP can't scale as a messaging protocol.
I'll qualify that, because I've been on both sides of this fence[1].

When doing distribution of messages to 10,000 or 100,000 recipients,
you want to be able to batch those deliveries (PSYC's notion of
multicast) and more importantly you want low-latency, high reliability
response characteristics. If each delivery of 10,000 takes even 1ms,
the total delivery time is going to be 10 *seconds*. A more realistic
number is the number of users a site has, the number of relationships
each of those users have, and the number of actions that each user
takes. With some random numbers, that could easily look like:

1000 users * 30 "friends" each * 30 actions per day = 900,000 requests per day.

At 1ms per request, that's 900 seconds; totally doable with 86,400
seconds of CPU time per day (with a single machine). For what it's
worth, Facebook's numbers (i.e., a reasonable upper limit, especially
given that the whole point of this is to do something decentralized)
probably look something like:

300,000,000 users * 30 friends each * 30 actions per day = 270 000 000
000 requests per day, or 3,125 days of CPU time worth of requests per
day at 1ms per request (780 quad-core machines - honestly, not an
unreasonable number for something as big as facebook).

*MOST* HTTP servers are terribly ill-suited to this sort of messaging
--- the sort that social networks promote. HTTP response times are
normally more like 30ms if you're lucky, 1-2s if you're not. So in
that regards, I agree, HTTP is ill-suited to this sort of work.

*However*; it's entirely possible to build HTTP servers that are much
faster than that. node.js currently serves as many as 12,000 requests
per second on a single core; that sort of response rate would mean
that our hypothetical Facebook would only need 260 days of CPU time
worth of requests per day, down to ~65 quad-core machines. To run
facebook. Using current, off-the-shelf open source technology, over
HTTP.

This is without taking into account the effects of
batching/"multicast" (which would be extremely beneficial considering
the scale of this hypothetical use case) or HTTP keep-alives, and so
on. HTTPs stateless nature means that massive parallelism works out of
the box - not something that can be said for e.g., XMPP or IRC. Most
XMPP servers will fail to successfully negotiate an s2s link unless
both servers are clustered at the back-end (thanks to dial-back), an
approach which is far from scalable in most XMPP servers. I love XMPP,
but don't ignore the fact that it's not a panacea, either – no
protocol is.

Neither does the above analysis take into account that in our ideal
world, Facebook-like networks don't exist. Scale, therefore, isn't
really an issue. As far as I can tell, Twitter ca. 2008 was a larger
messaging system than any "Enterprise" corporation has, outside of
stock trading systems. There's a good chance that Twitter and Facebook
are now growing larger than even the largest stock trading systems,
and they're doing it using HTTP.

All this is not to say "Go out and build a giant PubSub network with
Rails servers and PHP nodes" - that would be utter folly. On the other
hand, this *is* to say that if you were to build a large-ish PubSub
network comprised of PHP servers and Rails nodes, or vice versa, each
installation of which was relatively small, it would work. Not only
would it work, but technologies and approaches exist that would allow
it to scale to whatever size without any fundamental problems at all.

b.

[1] http://blog.webhooks.org/2009/01/24/blaine-cook-does-like-web-hooks/

Re: [Social-discuss] Which of four projects are we doing?

Reply via email to