[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Matthew Toseland Fri, 2 Sep 2005 21:59:21 +0100

Here's an interesting simplification:
- Subscribe request always go the full HTL. They don't "collide".


Results:
- Rerouting becomes much easier. We don't have to broadcast the fact
  that we lost the connection just to get other nodes to allow our
  connection through.
- Slight increase in message traffic caused by subscribes and
  resubscribes. Probably a net increase in traffic.
- Much simpler code.

So:
- When we receive a SubscribeRequest, we route it. Even if we already
  have the stream.
- When we get a SubscribeData which we haven't seen before, we send it
  to all our subscription-peers - all the nodes which have subscribed
  through us, plus all the nodes we have subscribed through as part of
  our subscription request.
- When we get a PublishData, we route it individually, but also, we pass
  it as above as a SubscribeData.
- ERROR HANDLING:

If we lose a connection, gain a connection, or see major topology
changes, we reroute from the beginning. Those nodes which we are no
longer subscribed through get sent a SubscribeTerminated message. This
is the same as the previous proposed handling for PublishData.

Race conditions are possible but can be dealt with by rejections.

So:
- If we receive a SubscribeRequest:
We don't want to end up subscribed at a lower HTL than previously...
This is a PROBLEM! What we have to do is use the maximum HTL of all our
current subscribers, rather than passing it through as-is.

But we do pass it through, admittedly with a new HTL.

We do not allow more than one subscribe request to be running at once.
So we effectively coalesce them. But if we are not running one, and we
get a request, then we will start one. If we then get another request at
a higher HTL, we note this fact, and when the first one has finished we
start another one.

When each subscribe request handler completes, we update the nodes
concerned - those we have routed to know we are subbed through them (or
not), but those which we previously were subbed through, but aren't any
more, we tell about this - SubscribeTerminated.

Okay, what about ID collisions? We want to keep prevent collisions with
nodes which are already dependant on us, right? Does it matter? We can
collide on one ID, but not on the others... but it doesn't matter that
much. Collision detection isn't vital, because we are moving in the
right direction - towards the key. Or is it? We could well loop back on
ourselves... we would run out of HTL, but it wouldn't be very
profitable. There is a similar situation with request coalescing. I
decided this wasn't a big deal a while ago, I'm not sure I remember
why... The obvious solution is not to coalesce subscribe requests. We
can achieve this while still using max HTL as above. The problem is two
such requests could be running at the same time. The answer to that is
to do the determination of which nodes need to be told that we are no
longer subscribed through them atomically at the end of the request.

Ideally we would coalesce them though, especially as they're all at the
same HTL. We may need to make coalescing (of requests and inserts as
well as subscribe requests) slightly less transparent... we
can hit our tail if we're not careful... we could send a Coalesced
message back, indicating we were coalesced with a given ID, and to
regard that ID as equal to the original ID, so to reject it if it loops
back... this could then be sent to the other nodes on our request
chain...

On Fri, Sep 02, 2005 at 09:09:53PM +0100, Matthew Toseland wrote:
> Actually, the mechanism proposed on routing through a node we're already
> connected indirectly to WON'T work, because there's no clear definition
> of dependancy. There is no clear definition of "the node from which I
> expect to receive packets". Thus if a node has to reroute, it will end
> up sending a SubscribeRestarted to the entire network.
> 
> Now, can we make it into a tree?
> 
> The data probably won't be inserted at the root node. We could route it
> up to the root node though.
> 
> The other problem is backtracking. The simple solution to this is to
> make the RNF loops we get dependant on us, and dependant on our final
> node. This is roughly analogous to how we deal with requests. Hmmm...
> 
> On Fri, Sep 02, 2005 at 09:00:42PM +0100, Matthew Toseland wrote:
> > Here's another attempt at a publish/subscribe architecture, after some
> > discussion with Ian, this should be simpler, persist even when the
> > source node goes down, and make a start on resource limiting:
> > 
> > TOPOLOGY
> > --------
> > 
> > Firstly, all the below will probably work much better with the new
> > backtracking mechanism.
> > 
> > There is no fixed path from the data source to the subscribe tree.
> > Packets are routed individually, and we make no attempt to track how
> > they were routed last time, or to keep nodes on the right path.
> > 
> > There is a single subscribe tree. Actually, it's a subscribe graph. The
> > reason it is a graph is backtracking. If I route a request with HTL 20
> > to a clump of 10 nodes, it will return with an RNF with HTL 10. I will
> > then route it elsewhere. Thus I will have two successful routes - the
> > data will have to be sent to both of them if I get a packet. Now, the
> > subscribe graph is the result of very many subscriptions. Each is routed
> > towards the key, and together they make a graph, which is somewhat
> > treeish, but is not a tree. It therefore has no fully defined root. But
> > data will certainly spread very quickly through it.
> > 
> > Then, when we route a publish packet, it will also be routed towards the
> > tree. In all likelihood it will eventually meet a node which is on the
> > graph. The data will then be broadcast across the graph. Each node
> > verifies it, checks that it hasn't already got it, and relays it to all
> > directly connected nodes. Because it is a graph and not a strict tree,
> > it is possible that the same packet would be sent to the same node twice
> > by two different nodes. This will waste some bandwidth. However, it
> > probably won't waste a LOT of bandwidth, because mostly the node will be
> > told directly before it is told indirectly.
> > 
> > So we have a persistent subscribe graph, and we have transient
> > PublishData's - equivalent to inserts. Actually this looks a lot like
> > passive requests when you think about it. The difference being that we
> > know the keys are connected, so you don't have to constantly send out
> > requests to subscribe to the next key in the sequence. And completely
> > different caching policy.
> > 
> > Each node will have a maximum of 32 streams. If we run out of streams,
> > we reject new SubscribeRequests. So it is important that we not lose
> > track of the streams. When a node is restarted, it should tell its
> > neighbours which streams it is subscribed to, and to disconnect from any
> > others. This will require a special message, BulkSubscribeStreams, or
> > something...
> > 
> > If a node no longer has any subscribers for a given stream, it drops the
> > stream. This might cause a cascade effect. But once one node has
> > subscribed to a stream, there will always be at least a chain of nodes,
> > moving outward from the start, to which the data can be sent.
> > 
> > These arbitrary limits suggest a DoS may be possible; we will have to
> > review this part of the architecture when we have some more experience.
> > LRU is an option, but would mean that streams which have infrequent
> > packets die, regardless of demand. That would be bad.
> > 
> > Now, here's the hard part: Error handling.
> > Error handling on publish is easy. The packets just get routed
> > differently. They end up in the same place, and they still rendezvous
> > with the subscribe graph.
> > 
> > BUT: Error handling on subscribe is, as always, complicated.
> > 
> > I adapt the previous proposal:
> > 
> > Subscriptions will normally be routed similarly to requests, and will
> > terminate on a collision i.e. they will terminate successfully when they
> > reach a node which has already got the stream. They can RNF and will be
> > rerouted in that case. Or they can run out of HTL, in which case they
> > are regarded as successful, although it is possible that they have run
> > into a network dead-end.
> > 
> > Once we are subscribed, we can have various errors:
> > - We disconnect from a node we were subscribed through.
> > We will have to reroute.
> > - The locations of our neighbour nodes change so much that the route
> >   would have changed.
> > We will have to reroute.
> > - A node downstream reroutes for one of the above reasons.
> > We will have to reroute.
> > 
> > Further complication: If an upstream has to reroute, it may have to
> > reroute through us. Therefore somehow it must tell us that it is about
> > to do so so that we pass the request on rather than accepting it.
> > 
> > Simplest option first:
> > We disconnect from a node we were subscribed through.
> > 
> > If this was the last or only node we were subscribed through (meaning we
> > successfully sent a subscribe request through), it is relatively simple:
> > We check what HTL we had left at that point, and rerun the request from
> > that point.
> > 
> > If the node which has broken was NOT the last node, things are more
> > complex. We know what HTL we had left at that point. We will probably
> > have to reroute from that point. Assuming the locations have not
> > changed:
> > - If the next node returned an RNF, we can use that as the next block -
> >   just starting at a different HTL; taking the same number of hops. Then
> >   we can route again, to a further node; this is merely adding to the
> >   path so is not a problem.
> > - If the next node returned success due to running out of hops, (this
> >   needs to be a distinct mode), we will definitely need to rerequest
> >   through it.
> > 
> > If a node downstream loses a connection, it will have to reroute. It may
> > then be that its RNF is no longer valid, and it has to give us one with
> > a different HTL. We can then act on this...
> > 
> > If the locations HAVE changed... things are even worse. If we discover
> > that we should have routed to B then C, not C then B, and they both
> > RNFed, this is not a problem. But if the latter succeeded due to running
> > out of hops, this is a problem.
> > 
> > What about HTLs? If we get a subscription with HTL 20, when we had
> > routed at HTL 17...? If it is less than our HTL, no problem, but if it
> > is more:
> > - If each node on the chain RNFed, we don't need to do anything, because
> >   the HTL was not the limiting factor; the size of the network was.
> > - Otherwise we may need to do some rerouting...
> > 
> > 
> > Also, if a node upstream has to reroute, it may end up rerouting through
> > us, and we would accept it because we have a stream already... the
> > mechanism described in the previous mail is adequate for dealing with
> > this though.
> > 
> > 
> > Possible general principles:
> > - Live requests: When we get the first subscribe request, we create a
> >   thread to route it. We then route it. The request succeeds, or at
> >   least, we run out of nodes. We then wait for:
> >   a) A connection to a node to go down
> >   b) A connection to a (possibly new) node to come up
> >   c) Any connected node we routed through to change location
> >   d) Any node we routed through to return a message such as a new RNF,
> >   etc.
> >   If this happens, we may have to reroute.
> > 
> > This is exactly the same challenge we faced with the other side of the
> > equasion (the data publish blocks) in the first version of the draft.
> > The solution was to route them individually and keep nodes updated as to
> > whether they were involved if we routed through them on one packet but
> > not on the next. We can't do this exactly for subscriptions however.
> > What is the answer?
> > -- 
> > Matthew J Toseland - toad at amphibian.dyndns.org
> > Freenet Project Official Codemonkey - http://freenetproject.org/
> > ICTHUS - Nothing is impossible. Our Boss says so.
> 
> 
> 
> > _______________________________________________
> > Tech mailing list
> > Tech at freenetproject.org
> > http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech
> 
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.



> _______________________________________________
> Tech mailing list
> Tech at freenetproject.org
> http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/tech

-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20050902/edb18ae2/attachment.pgp>

[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Reply via email to