[Tech] Re: What is backoff for?

Matthew Toseland Fri, 11 Aug 2006 00:08:11 +0100

On Thu, Aug 10, 2006 at 03:07:07PM -0700, Ian Clarke wrote:
> Ok, lets try it - but unless there is an obvious marked improvement,  
> we must *remove* it (rather than just continuing to add complexity).


0. Well, the *simplest* thing would be to remove load balancing
entirely, and rely solely on routing and load limiting. Nodes will not
accept more requests than they can handle due to pre-emptive rejection,
but if we have slow nodes nearby the sender will be significantly slowed
down.

PRO: Simple!
CON: Whole network may be dragged down by slow nodes.

1. Use backoff, but as an advisory mechanism; if we run out of
non-backed-off nodes then use backed off ones.

PRO: Most similar to current situation.
CON: Still fairly arbitrary.

2. The mechanism in the last email: Calculate the proportion of nodes
rejected (or some other performance metric) for each node, and if a node
is worse than the median, shrink its specialization through bias values.

Specifically:

Metric M, lower is better. M could be proportion of requests rejected,
or it could be time for a successful request.

Distance d = routing distance from target to peer.location.

If peer.M < median.M:
        return d
Else
        return d * median.M / peer.M

PRO: We don't have to use the proportion of requests rejected; we could
use any performance metric. Less arbitrary than backoff.

3. I believe edt had another algorithm.
> 
> Ian.
> 
> On 10 Aug 2006, at 14:50, Matthew Toseland wrote:
> 
> >On Thu, Aug 10, 2006 at 11:43:32AM -0700, Ian Clarke wrote:
> >>Why use a median, rather than a mean?
> >>
> >>I'm concerned about this, it increases (yet again) the complexity of
> >>our load balancing, it hasn't been simulated, and I think there will
> >>inevitably be unpredictable results.  For example, what happens if
> >>*all* nodes have an AcceptedFrac below the median?  This mechanism
> >>will just spiral down until all nodes are hardly accepting anything.
> >
> >The current mechanism is spiraling down, and nodes are accepting  
> >hardly
> >anything. The median is defined as the middle value if you order the
> >values, so half the nodes will be above it and half below it. So we'll
> >always be able to send as many requests as we want to half the nodes.
> >The reason for the current lack of performance is overcompensation for
> >load, as evidenced by the low bandwidth usage. To be precise, it's
> >because backoff is used as a load limiting mechanism and not just a  
> >load
> >balancing mechanism.
> >>
> >>We can *not* afford to fall back into the trap of adding more and
> >>more complexity to the point that nobody understands what is going
> >>on, especially in the area of load balancing which is one of the
> >>areas that is most difficult to get our heads around.
> >
> >The theory is simple: We have propagation of overload back to the
> >original senders. They will slow down if they get lots of timeouts and
> >forwarded RejectedOverload's. That's load *limiting*. The job of load
> >*balancing* is to prevent all our traffic being routed to slow nodes
> >which then generate lots of RejectedOverload's and slow down the whole
> >network. We could take out load balancing completely, but that might
> >result in a significant slowdown due to slow nodes. What's even better
> >is that this scheme has fewer arbitrary constants than backoff.
> >>
> >>I really feel that we lack a "theory" of load balancing, a simple
> >>conceptual model through which we can think about the problem and
> >>find a solution.  Rather, we come up with idea after idea based on
> >>vague and invariably flawed notions and assumptions, layering one on
> >>top of the other until nobody understands what is happening.
> >
> >Load limiting: Sender side, determining how fast to send requests into
> >the network. If there are too many RejectedOverload's, the sender  
> >slows
> >down. This will ensure the network is not overloaded, full stop. It
> >works for TCP/IP, it should work for us.
> >
> >Load limiting without load balancing: If there are slow nodes near the
> >sender, and we send these an equal proportion of our incoming requests
> >(according to their location), then most of those requests will be
> >rejected, and this results in an overall slowdown.
> >
> >Load balancing: Routing side. The purpose is to determine which  
> >nodes to
> >send requests to, in order that a) our requests are reasonably  
> >fast, and
> >b) we don't send too many requests to really slow nodes, causing an
> >overall slowdown via load limiting. Load balancing can legitimately be
> >"selfish" i.e. it can send a request even if all available nodes are
> >slow, because load LIMITING will reduce the overall send rate if the
> >overall network is slow; that's not load balancing's problem. Not  
> >being
> >"selfish" in this sense is a threat to anonymity (all nodes backed off
> >except one), usability (all nodes backed off), and performance, and is
> >not justified. TCP/IP runs over more than ethernet, and I don't think
> >ethernet collision detection is a viable model for backoff.
> >
> >Conceptually this isn't a difficult model. And I don't think it could
> >possibly be any worse than current performance; right now we are  
> >plainly
> >overcompensating for load, the reason for this is backoff, which is a
> >means for both limiting and balancing load, when it should be solely a
> >balancing mechanism; we have a better limiting mechanism already.
> >>
> >>We must place a moratorium on all but the most trivial "improvements"
> >>to load balancing unless they have been simulated.
> >
> >That could be another month. Michael has rightly insisted on doing a
> >fairly low level simulation, which has resulted in us rearchitecting a
> >lot of the lower layers of Freenet; this is a good thing, but it takes
> >time. Token passing is a substantial change, which we should not  
> >deploy
> >without simulation. It should solve our current problems, but if it
> >takes another month for it to be ready, given the current situation of
> >massive overcompensation for load resulting in universally low  
> >bandwidth
> >usage, lots of backoff, and really slow inserts, it seems  
> >reasonable to
> >me to try the above. Actually I don't think that load limiting with no
> >backoff would do too badly, but it would be rather sensitive to modem
> >nodes etc.
> >>
> >>Ian.
> >>
> >>On 9 Aug 2006, at 07:30, Matthew Toseland wrote:
> >>
> >>>What is backoff for? It's not to prevent overloading nodes; nodes
> >>>reject
> >>>requests pre-emptively. What it is for is this:
> >>>
> >>>If we don't backoff slow nodes, then these slow nodes will have to
> >>>reject (or occasionally timeout) the requests that we do forward to
> >>>them. Those failures will be propagated back to the request
> >>>senders, who
> >>>will have to reduce their send rate. In other words, if we don't  
> >>>avoid
> >>>sending requests to the slowest nodes, the requests we do send to  
> >>>them
> >>>will be rejected, and those rejections will cause a universal  
> >>>slowdown
> >>>in request rates.
> >>>
> >>>Thus, I don't think that the current mechanism is sensible. Nodes
> >>>can be
> >>>expected to take care of themselves, now that we have pre-emptive
> >>>rejection. I don't think the ethernet analogy works 100%; we don't
> >>>really have a shared medium.
> >>>
> >>>What we are looking at here is simply a way to ensure that requests
> >>>are
> >>>not forwarded to nodes who will inevitably reject them and thus
> >>>cause an
> >>>overall slowdown at the requester end. This is completely  
> >>>different to
> >>>the case in 0.5, where requesters never slowed down.
> >>>
> >>>So what am I proposing?
> >>>
> >>>For each node, we track the proportion of requests which are  
> >>>accepted.
> >>>We calculate a median. For any node above the median, all  
> >>>requests are
> >>>forwarded to it: if most of our nodes are slow, that's not our
> >>>problem,
> >>>it's our peers who want to forward requests through us who have to
> >>>deal
> >>>with it (by not sending requests to us). For nodes below the
> >>>median, we
> >>>artificially narrow their specializations:
> >>>
> >>>While routing:
> >>>
> >>>public double distance(PeerNode node, double target) {
> >>>
> >>>double distance;
> >>>if(node.acceptedFrac >= medianAcceptedFrac) {
> >>>   distance = rawDistance(node.location, target);
> >>>} else {
> >>>   distance = rawDistance(node.location, target) *
> >>>           medianAcceptedFrac / node.acceptedFrac;
> >>>}
> >>>return distance;
> >>>}
> >>>
> >>>
> >>>Now, to some degree, this is alchemy, in the sense that it hasn't  
> >>>been
> >>>simulated. However it will be some time before the new rate limiting
> >>>mechanism is available, and it appears a solid argument to me: If  
> >>>all
> >>>our peers are slow, that DOES NOT mean we shouldn't send requests to
> >>>them. Because we have load propagation, it is entirely legitimate to
> >>>send requests ANYWAY, and let upstream nodes deal with the  
> >>>slowness -
> >>>either the entire network is slow, in which case we reduce all the
> >>>send rates, or this particular node is slow, in which case we reduce
> >>>the number of requests sent to it! Note also that slavishly  
> >>>following
> >>>backoffs reduces our anonymity quite noticeably.
> >>>
> >>>Comments?
> >>>-- 
> >>>Matthew J Toseland - toad at amphibian.dyndns.org
> >>>Freenet Project Official Codemonkey - http://freenetproject.org/
> >>>ICTHUS - Nothing is impossible. Our Boss says so.
> >>
> >>Ian Clarke: Co-Founder & Chief Scientist Revver, Inc.
> >>phone: 323.871.2828 | personal blog - http://locut.us/blog
> >>
> >
> >
> >
> >-- 
> >Matthew J Toseland - toad at amphibian.dyndns.org
> >Freenet Project Official Codemonkey - http://freenetproject.org/
> >ICTHUS - Nothing is impossible. Our Boss says so.
> 
> Ian Clarke: Co-Founder & Chief Scientist Revver, Inc.
> phone: 323.871.2828 | personal blog - http://locut.us/blog
> 



-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20060811/03667fae/attachment.pgp>

[Tech] Re: What is backoff for?

Reply via email to