[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Matthew Toseland Fri, 2 Sep 2005 21:00:42 +0100

Here's another attempt at a publish/subscribe architecture, after some
discussion with Ian, this should be simpler, persist even when the
source node goes down, and make a start on resource limiting:

TOPOLOGY
--------

Firstly, all the below will probably work much better with the new
backtracking mechanism.

There is no fixed path from the data source to the subscribe tree.
Packets are routed individually, and we make no attempt to track how
they were routed last time, or to keep nodes on the right path.

There is a single subscribe tree. Actually, it's a subscribe graph. The
reason it is a graph is backtracking. If I route a request with HTL 20
to a clump of 10 nodes, it will return with an RNF with HTL 10. I will
then route it elsewhere. Thus I will have two successful routes - the
data will have to be sent to both of them if I get a packet. Now, the
subscribe graph is the result of very many subscriptions. Each is routed
towards the key, and together they make a graph, which is somewhat
treeish, but is not a tree. It therefore has no fully defined root. But
data will certainly spread very quickly through it.

Then, when we route a publish packet, it will also be routed towards the
tree. In all likelihood it will eventually meet a node which is on the
graph. The data will then be broadcast across the graph. Each node
verifies it, checks that it hasn't already got it, and relays it to all
directly connected nodes. Because it is a graph and not a strict tree,
it is possible that the same packet would be sent to the same node twice
by two different nodes. This will waste some bandwidth. However, it
probably won't waste a LOT of bandwidth, because mostly the node will be
told directly before it is told indirectly.

So we have a persistent subscribe graph, and we have transient
PublishData's - equivalent to inserts. Actually this looks a lot like
passive requests when you think about it. The difference being that we
know the keys are connected, so you don't have to constantly send out
requests to subscribe to the next key in the sequence. And completely
different caching policy.

Each node will have a maximum of 32 streams. If we run out of streams,
we reject new SubscribeRequests. So it is important that we not lose
track of the streams. When a node is restarted, it should tell its
neighbours which streams it is subscribed to, and to disconnect from any
others. This will require a special message, BulkSubscribeStreams, or
something...

If a node no longer has any subscribers for a given stream, it drops the
stream. This might cause a cascade effect. But once one node has
subscribed to a stream, there will always be at least a chain of nodes,
moving outward from the start, to which the data can be sent.

These arbitrary limits suggest a DoS may be possible; we will have to
review this part of the architecture when we have some more experience.
LRU is an option, but would mean that streams which have infrequent
packets die, regardless of demand. That would be bad.

Now, here's the hard part: Error handling.
Error handling on publish is easy. The packets just get routed
differently. They end up in the same place, and they still rendezvous
with the subscribe graph.

BUT: Error handling on subscribe is, as always, complicated.

I adapt the previous proposal:

Subscriptions will normally be routed similarly to requests, and will
terminate on a collision i.e. they will terminate successfully when they
reach a node which has already got the stream. They can RNF and will be
rerouted in that case. Or they can run out of HTL, in which case they
are regarded as successful, although it is possible that they have run
into a network dead-end.

Once we are subscribed, we can have various errors:
- We disconnect from a node we were subscribed through.
We will have to reroute.
- The locations of our neighbour nodes change so much that the route
would have changed.
We will have to reroute.
- A node downstream reroutes for one of the above reasons.
We will have to reroute.

Further complication: If an upstream has to reroute, it may have to
reroute through us. Therefore somehow it must tell us that it is about
to do so so that we pass the request on rather than accepting it.

Simplest option first:
We disconnect from a node we were subscribed through.

If this was the last or only node we were subscribed through (meaning we
successfully sent a subscribe request through), it is relatively simple:
We check what HTL we had left at that point, and rerun the request from
that point.

If the node which has broken was NOT the last node, things are more
complex. We know what HTL we had left at that point. We will probably
have to reroute from that point. Assuming the locations have not
changed:
- If the next node returned an RNF, we can use that as the next block -
just starting at a different HTL; taking the same number of hops. Then
we can route again, to a further node; this is merely adding to the
path so is not a problem.
- If the next node returned success due to running out of hops, (this
needs to be a distinct mode), we will definitely need to rerequest
through it.

If a node downstream loses a connection, it will have to reroute. It may
then be that its RNF is no longer valid, and it has to give us one with
a different HTL. We can then act on this...

If the locations HAVE changed... things are even worse. If we discover
that we should have routed to B then C, not C then B, and they both
RNFed, this is not a problem. But if the latter succeeded due to running
out of hops, this is a problem.

What about HTLs? If we get a subscription with HTL 20, when we had
routed at HTL 17...? If it is less than our HTL, no problem, but if it
is more:
- If each node on the chain RNFed, we don't need to do anything, because
the HTL was not the limiting factor; the size of the network was.
- Otherwise we may need to do some rerouting...

Also, if a node upstream has to reroute, it may end up rerouting through
us, and we would accept it because we have a stream already... the
mechanism described in the previous mail is adequate for dealing with
this though.

Possible general principles:
- Live requests: When we get the first subscribe request, we create a
thread to route it. We then route it. The request succeeds, or at
least, we run out of nodes. We then wait for:
a) A connection to a node to go down
b) A connection to a (possibly new) node to come up
c) Any connected node we routed through to change location
d) Any node we routed through to return a message such as a new RNF,
etc.
If this happens, we may have to reroute.

This is exactly the same challenge we faced with the other side of the
equasion (the data publish blocks) in the first version of the draft.
The solution was to route them individually and keep nodes updated as to
whether they were involved if we routed through them on one packet but
not on the next. We can't do this exactly for subscriptions however.
What is the answer?
--
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20050902/5a7d839f/attachment.pgp>

[Tech] Publish/subscribe, 3rd attempt - confusion over error handling

Reply via email to