The results of last week's XMPP Summit are beginning to bleed out as Ralphm blogs the first of a promised series of notes on the event. See: http://ralphm.net/blog/2008/07/26/xmpp_summit_5 and http://ralphm.net/blog/2008/07/26/xmpp_social_networks_1
Not surprisingly, it seems that those at the Summit agreed that the most sensible way to federate XMPP PubSub servers is to have various servers subscribe to each other. Thus, if I was running a microblogging service that provided open access to "public" posts on my service, I might set up a node to which I published all such "public" posts. Other microblogging services, search engines, etc. would then subscribe to that node and, by doing so, could mix messages published to my service with those published to their own service. This approach of "Federation via Subscription" has some distinct advantages over the alernative, "Federation via Publishing", particularly in that it eases spam control and management of server resources. However, it has a distinct disadvantage in that it makes it somewhat harder to form networks of cooperating servers. In a system which relies on Federation via Subscription, all servers that receive messages must have knowledge of potential publishers prior to any data flowing between them. Given two servers, A and B, no data will flow from A to B unless B first becomes aware of A and subsequently subscribes to at least one node on A. The interesting question becomes: "How does B become aware of A?". Since no data can flow between the two servers until a subscription is established, if there are no other mechanisms provided, one must assume that B discovers A via "out-of-band" communications such as email messages, phone calls, directory lookups, etc. These are, of course, rather crude discovery methods and require manual configuration upon discovery to establish federating subscriptions. An alternative means for facilitating discovery would be to extend the XEP-0060 PubSub specification to support a means for servers to publish "Advertisements" which announce the availability of nodes for federation. Advertisements would specify which nodes are available for federation and what data will be published over those nodes. In order to reuse as much existing framework as possible, Advertisements would be published just like normal events, but they would be published to a "well known node" that is commonly available on all services that support advertisements. This node might be named: "http://jabber.org/protocol/pubsub#advertisements" and would be like any other pubsub node in that it could be subscribed to, read, etc. However, it would only support publishing <advertisement/>s not <event\>s. The basic assumption behind federation is that two services will be publishing data which is similar. For instance, that two micro-blogging services will both be publishing micro-blogging entries that are formatted as Atom entries. Agreement on the payload formats is essential to enable federation. On the other hand, it is unreasonable to insist that all servers use common node names. Thus, a mechanism is needed to provide a mapping from some commonly agreed name for a stream of data and the node name that is used on any particular server. This can be accomplished by having the Advertisement provide a mapping from commonly understood logical node names to local concrete names. Thus, those creating micro-blogging standards might say that the logical node name for publishing public posts is: http://example.com/PublicMicroBloggingPosts. Then, a server that published public posts on a node named "987ye879799wwww00" would simply provide both the local and logical name for the node in the advertisement. Given this introduction, an advertisement might look like the following: (but, use of an xdata form might be more appropriate and more flexible...) <iq type='set' from='[EMAIL PROTECTED]' to='old_service.shakespeare.lit' id='ad1'> <pubsub xmlns='http://jabber.org/protocol/pubsub'> <publish node='http://jabber.org/protocol/pubsub#advertisements'> <advertisement xmlns='http://jabber.org/protocol/pubsub#advertisements' id="*tag:[EMAIL PROTECTED],2008-07-24:1234*"> <local node='987ye879799wwww00' format='http://example.com/post_format'\> <common node='http://example.com/PublicMicroBloggingPosts <http://www.w3.org/2005/Atom>'\> <description>All public posts on this server.</description> </advertisement> </publish> </pubsub> </iq> If the Advertisement node is supported as a normal node, then it should be possible for others to subscribe to the node and thus monitor advertisements as they are published. Using filters, subscribers would either subscribe to all advertisements published to the remote node or only to those advertisements that are specific to that node. This permits advertisements to flow to nodes not known to the advertiser as well as to permit servers to ensure that they are rapidly made aware of changes to servers in which they have an interest. Additional metadata such as keywords, etc. could be added to make filtering easier and more effective. Of course, "Advertisers" shouldn't expect that the mere act of advertising will always result in a federating subscription. Server managers will still often want to moderate the lists of nodes they subscribe to. Nonetheless, the mechanism a foundation on which automatic subscription will sometimes reasonably be built. For instance, I might wish to build a microblogging aggregator that automatically subscribes to all remote services that claim support for microblogging. Or, I might have a strong trust relationship with some other service and decide that I would like to have my service subscribe to anything advertised by the that service -- while manually reviewing advertisements from other services... Many patterns are possible and reasonable. Those familar with blogging infrastructure will recognize a great deal of similarity between the idea of Advertisements and that of "pinging." In fact, within the blogging world, pinging is probably the most common and useful means available to blog aggregators to discover new blogs. In fact, it can be argued that the introduction of pinging and its use by blog aggregators was probably one of the most essential steps in building the blogging infrastructure as we know it today. Before pinging, the process of discovering new blogs was horribly difficult, inaccurate and expensive for service providers. Comments? Does this sound reasonable? bob wyman
