> On 16 May 2020, at 16:05, Mike Perry <mikepe...@torproject.org> wrote: > >> On 4/23/20 1:48 PM, Matt Traudt wrote: >> >> 5.4 Other Changes/Investigations/Ideas >> >> - How can FlashFlow data be used in a way that doesn't lead to poor >> load balancing given the following items that lead to non-uniform >> client behavior: >> - Guards that high-traffic HSs choose (for 3 months at a time) >> - Guard vs middle flag allocation issues >> - New Guard nodes (Guardfraction) >> - Exit policies other than default/all >> - Directory activity >> - Total onion service activity >> - Super long-lived circuits >> - What is the explanation for dennis.jackson's scary graphs in this [2] >> ticket? Was it because of the speed test? Why? Will FlashFlow produce >> the same behavior? > > It will also be wise to provide a way for relays to signify that they > are on the same machine. I bet concurrent machine deployments are one of > the top contributors to the long tail of bad perf we saw caused by the > Flashflow experiment[2]. If flashflow measures each such relay as having > the full link capacity instead of a shared fraction, this is obviously > going to result in overload on those relays, leading to a long tail of > bad perf when they are chosen and are also overloaded. It is unlikely > that we can deploy a FlashFlow that has this long tail perf problem > without fixing this and related balancing issues (though hopefully most > will be smoothed over by sbws). > > This is a little tricky, because we might not want rogue relays joining > each others "machines" (similar to the Family problem), but for testing > something as simple as how MyFamily works would be great. Ideally, > though, relays would ask or detect that they are concurrently running in > nearby IP space and either warn the operator to set the flag, or set it > automatically. > > We actually have this work included in a future performance funding > proposal, but the timeline on that getting approved (or even rejected) > is so far out that we should figure out a way to do this before that, > especially if Flashflow development is going to begin soon.
We could assume that relays on the same IPv4 /24 or IPv6 /48 share a network link, and re-do the experiment. Then we could tweak the network size based on those results. We'd need to compromise between "false sharing" and "missed sharing". Then individual operators could fine-tune that initial heuristic using the "same network link" config. (This is similar to how MyFamily works: Tor assumes that relays in the same IPv4 /16 and IPv6 /32 have the same network operator. Then individual relay operators can declare extra families using MyFamily.) T _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev