----- Anonymous at o9_0DTuZniSf_+oDmRsonByWxsI ----- 2007.06.17 - 13:53:13GMT -----
Sounds like the proposal should be expressed in more formal way, to ensure that we all do not speak about different things. Maybe comparing it with current implementation is a good starting point, so the proposed differences are: - instead of using a single system UDP socket, each connection should be given a distinct UDP socket bound to specific remote address. The UDP socket non-bound to remote is still used to notice new connections (and connections where peer changed its IP address) - once newly seen address authentificated, new bound UDP socket created to handle the connection. Potentially, specific peer might be sending UDP packets from different IP address, where the address changes not on daily/hourly basis (which is quite common) but a remote NAT assigns a random address from a pool of alloted addresses for [virtually] every packet. I havn't seen so badly broken internet connections yet; most likely such node would prefer using TCP or HTTP over any UDP based transport solution anyway. The unbound UDP socket is given low weight as far as bandwidth sharing algoritm goes. - Aside the unbound UDP socket, the per-connection UDP sockets are given equal bandwith sharing weight (with unused bandwidth somehow (not neccessarily evenly) distributed among peers that could handle more traffic). Each peer/socket is handled by a separate thread, and so different peers do never block each other on network I/O. Other words, every peer is guaranteed to get at least 1/(P+1) share of incoming bandwidth - never long delays of no read which might lead to oscillations (which, as mroger pointed out, could be fought with very inertious averages, but no need to provoking them also). But as all the momentarily unused bandwidth is made available for other peers/sockets, the token bucket is very small, and most likely should be not higher than zero (using the following semantics: upon request processing the bucket size becomes negative, and next read from the socket might occur only when the buckets gets refilled back to zero level at which point the continuous background refilling stops). Using NIO the extra threads could be avoided, but it is not really needed seeing that a node is dicouraged having more than a dozen connected peers anyway. The threads will of course be synchronizing on local tokens/bucket handling - which is expected to be within a microsecond, and tasks like db access - which is very good as it keeps node from spiralling down due to CPU/disk performance starvation. Simple implementation would also limit each peer up to one simultaneous db access - and that is good, as that throttles the peers that abuse our disk resources (by flooding requests with htl=1 for example). - the current I/O bandwidth liability limiting mechanism is not deprecated at all; as toad pointed out multiply times, it is essential for handling tiny requests that tend to result in large responses often enough. Once a packet is read from a bound UDP socket: a) an insert or request: it is verified against the bandwidth liability limiters; is the checks passed, the insert/request accepted as it currently does. Otherwise the insert/request gets explicitly rejected (FNPRejectOverload) as it currently does. Nothing really changes here. At the moment of accepting the insert/request the peer token bucket might be charged, in order to discourage floods. Then credit the charged tokens to the peer we later received the requested data from. b) data we are awaiting for, either for our node or as forwarded request. c) auxilarly traffic like acks or location swaps or keepalives. These undergo no liability limiting, as the transfer already happened - exactly the same as now. Notice that if the received traffic is 'valuable' (explicitly requested earlier) we credit the peer with the tokens charged from the original requestor bucket (the only mechanism to possibly raise the bucket size above zero), so this peer is not penalized (at very least not as heavily as now) - and so requests of this peer are not getting postponed for long. As the bonus tokens are not just thrown in, but taken from other peers who are interested in service, there should be no bursts happenning. At least, charging buckets might be done at the moment the data response sent, not at the moment of requests (or the charge can be somehow split among those timepoints). Now, a question should be raised - if all the liability limiter stuff is expected to work as it already does, why bother? The trick is that delayed socket reads will be allowing remote side to notice in advance that the peer became slower. The remote will not know if that happens due to exceeding the 1/(P+1) share of bandwidth, or due to CPU/disk/swap/whatever performance shortage: all these situations require slowing down to avoid rejects and/or timeouts. So the slowing down is made at 21 levels: first, the delayed socket reads smooth the load over longer time period, and second, the requestor will start sending/queueing less requests here, more utilizing alternative paths and less busy peers. And that's the whole point. Additional note: this algorithm requires no modification if UDP sockets are mixed with TCP ones (provided that TCP window size is comparable to MTU, not the default 32/64/128KB). Until last week I was under impression that the timing based mechanism of routeing around overloaded nodes is already implemented. After taking time to study the close closer it turns out to be false: AFAIS the routing around happens only upon explicit FNPRequestOverload - the freenet.node.PeerManager._closerPeer() method does not take the peer roundtrip average into account at all. Here is the [rather simple] improvement I propose: instead of making routing decision based solely on location distance skipping 'unsuitable' locations, I propose searching a peer by choosing the best (numerically highest) request advance speed, specificially for each node: - disconnected peer is skipped (as it is now); - the peers we already tried to search the key at are skipped (as it is now); - the peer the original request came from is skipped (as it is now); - once passive requests implemented, other requestors of the key are to be skipped too (is they gain the key, the subscription will time out soon and then we will be able to try there); - [the maxDiff and the old version logic is not too clear for me]; Now the real salt: - the speed is calculated as signedLocationDiff(our_location, peer_location, key_location)/RTT(peer) where signedLocationDiff is positive if locationDiff(our_location, key_location)>locationDiff(peer_location, key_location), and negative otherwise (assuming our_location!=peer_location, which would give zero location difference but should never happen). That can be multiplied by nodeAveragePingTime if values of predictable range (say -3..3) are preferred (but no need to provide guarantees). - backed off node speed is subtracted by 3 (or thereabout; but maybe smaller value, like 1 or even 0.5). Using the speed for connections sorting is very natural: it basically allows to answer the question 'which peer the request should be sent to in order to receive response ASAP - preferring to use slightly more bandwidth on underloaded nodes (where bandwidth is idle i.e. virtually free) rather than waiting for slow/overloaded nodes to do the job, and provided that the solicited key already exists where it should'. And so it gives the following advantages: - (major) we start routing around busy nodes well before they start generating FNPRejectOverload packets. - (minor) no longer need to scan the peer list twice, first skipping backed off nodes then including them. - (noticeable) requests that result in successful data retrieve will be delivering answer to original requestor faster (on average). - (average) the routes will be more fluid and thus harder for adversary to supervise or censor; - (minor) very fast nodes even with small own storage/cache provide effective caching services (for popular keys) for their immediate peers which are slow but have huge well-specialized storage. And note that this algorithm will be useful even for the current code/protocol, without delayed socket reads; delaying the socket reads will undoubtfully have positive impact by smoothing the choice over time (and efficiently fighting any oscillations) and having routing around busy nodes to start occuring much earlier (starting from 'boundary' keys, then if the node anyway gets increasingly busy, by further narrowing down the location space we try at the busy node). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20070808/92b3fe4a/attachment.pgp>
