Hi everyone,
Just read that very thought provoking thread. Just subscribed (sorry
for breaking/reviving the thread)
For a quick background, I'm the author of the atompub-pubsub module
for ejabberd,
and wrote an atom validator for the ejabberd xml format and a
node_atom for ejabberd pubsub.
Among my interests leverage the respective strengths of http and
xmpp ... and interconnect one with the other.
My 2 centimes :
The main difference between pull and push is that some state that has
to be kept on the push server.
Meaning that we need a datastore that scale as traffic and volume grow.
The experience scaling datastores has been quite documented/solved now :
- With memcached/rdbms combinations
- Or those new DBs like CouchDB and my current playground, AWS SimpleDB.
As a sidenote I plan to release a few modules for ejabberd
(authentication/roster/mod_last) using SimpleDB,
and will probably port pubsub/pep on it (with the payload on Amazon
S3). The objective is to have an EC2 AMI
with ejabberd running without any data (except the transient state)
stored in.
Another problem that should be taken care of is traffic to/from the
servers.
There should be a way of multicasting to all users of another domain a
single notification/presence
XEP-0033 (Extended Stanza Addressing) may or may not help. I haven't
really looked into it.
Also taken from the http book, caching should be implemented.
We have everything specified here http://xmpp.org/extensions/xep-0131.html
and of course in the http RFC.
Implement http-style caching (etag/if-modified etc) for roster, disco
and pubsub for get-items will help regarding traffic.
(there's a bit of "polling" in XMPP though it's usually user
initiated ;)
There's also XEP-230 (Service Discovery Notifications) and XEP-237
(Roster versioning) that could help.
But from a server side perspective they are much harder to implement
compared to an all-or-nothing approach like http's.
And they tend to add more state ...
There's also the problem of longlasting connections and there
management.
HTTP experience can't help much in this regard ;)
But XEP-198 (Stream Management) helps for fast reconnect.
Yet currently lacking IMO is a way to rebalance connections.
If I want to dynamically add or remove servers (EC2-style) I'd like a
way to inform clients that they should reconnect.
The load balancer would then direct the reconnection to another server.
Cheers,
c*
http://www.cestari.info/
http://twitter.com/cstar
JID : cstar-at-ohmforce.com