On Sat, Oct 23, 2010 at 6:44 PM, Martin Sustrik <[email protected]> wrote:
> Hi Gerard, > >> >> I've read the mails about publisher side filtering here: >> >> http://thread.gmane.org/gmane.network.zeromq.devel/3560 >> >> - Is there now a current ongoing effort to put publisher-side filtering in >> 0MQ that I may possibly contribute to, which also allows >> API users to specify their own methods of filtering as they see fit? >> > These are two distinct issues: > > 1. Publisher-side filtering a.k.a. subscription forwarding. > 2. Custom filtering algorithms. > > The former is a pretty clear functionality that has to be implemented > sooner or later. If you want to contribute to that, you are welcome. > > The latter is something that pops up every now and then but nobody have > proposed any clear semantics for it yet (especially w.r.t. how it interacts > with the subscription forwarding). Thus, if you want to contribute this kind > of functionality, you have to define the intended semantics first. There's a document on the 0MQ site which mentions how routing is done through an inverted bitmap. It can compile this matrix, because the number of possible queries is finite. However, even for finite domains, one still has to consider the dimensionality of the matrix for practicality. 1. "Standard" subscription forwarding in my interpretation means forwarding messages selectively based on a topic. 2. A topic can be considered as a single piece of metadata (metadata key) attached to a message (rather than thinking of it as a 'channel' ). 3. Custom filtering always involves filtering messages on multiple metadata keys instead of just one. These keys are generally derived from values in the message contents. The bad thing here is that, to do this efficiently from a network perspective, this would require 0MQ to know about the message format. So, either some complicated functionality exists for message inspection or messages have a pre-determined format. 4. Adding more metadata keys to messages is not really an option. Because it is assumed that producers have no knowledge which particular messages a subscriber is interested in, the only reasonable option here is to add each searchable value into the metadata as a key. Taken to the extreme, this means duplicating the message, once as metadata and then as application formatted data. So, yes, it sounds like custom filtering *is* a very bad idea and that it's a compensation for other things incomplete in the design, or chosen poorly. The power of subscription forwarding however is determined by the expressivity of the single metadata key and the different ways in which this can be matched to more specific queries, from the perspective of a consumer/subscriber. >From the perspective of a router/broker, it is more important how fast these comparisons can be made, because it is more concerned about message volume throughput. Those seem competing issues for an implementation. A couple of things seem necessary: 1. Come up with a suitable specification for how topics are expressed. e.g.: a.b.c .. does it allow wildcards? a.*.c? Wildcards significantly increase the complexity. 2. Together with 2, come up with a strategy for topic matching. Inverted bitmaps were named in the 0MQ docs. I've been looking into bloom filters and how these could be used for achieving something similar. The advantage of bloom filters is that less absolute knowledge is required. Absolute knowledge is knowing that currency=USD is placed in column 15 of the inverted bitmap (which has to be consistent across the cluster). A bloom filter just needs to use the same hashing functions everywhere and it needs to be properly dimensioned. The dimension depends on the total number of topics that can exist in the domain and the probabililty that you allow for having a false positive. 3. Can a single pub/sub channel have many forwarding subscriptions? Maybe a 'client device' is handy here, which uses the basic functions and groups them together through a zmq_polling device, as the mechanism through which messages are retrieved will be very similar. The idea is that for each incoming message on a particular channel, basically a subscription, a different callback function may be called. (which has some complexities regarding threading, 100% CPU consumption, etc.) 4. Filters need to be communicated from sub to pub in some kind of handshake. If allowing for multiple subscriptions, how does a client notify a filtering publisher that some of its subscription interests have changed over the course of its lifetime? 5. Not allowing for multiple subscriptions, then for each topic of interest a new physical socket is opened. The broker handling a number of clients may then quickly run out of resources, similar to a broker connecting to a router? 6. When some forwarding device loses a connection to a client, then its own set of subscriptions changes. It is impossible to do this if a new channel must be set up each time a subscription list changes. 7. Custom filtering sounds like a useful addition to the client device. A different, configurable callback function that determines if a message is passed to the actual message handling function or not. Feedback welcome, -- Gerard
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
