#21315: publish some realtime stats from the broker? -----------------------------------+--------------------------- Reporter: arma | Owner: (none) Type: enhancement | Status: new Priority: Medium | Milestone: Component: Obfuscation/Snowflake | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: #29461 | Points: Reviewer: | Sponsor: Sponsor19 -----------------------------------+---------------------------
Comment (by irl): Replying to [comment:5 cohosh]: > It sounds like we have a few things we want to achieve/learn from collected metrics: > - Detect censorship events > - Allow current or potential proxies to see if they are needed > - Allow clients to see whether their connection issues are due to censorship or proxy availability > - Help us figure out whether we should be doing something different in distributing proxies to clients These all seem like good goals. > We current collect and "publish" information on: > - how many snowflake are currently available along with their SIDs (available at broker /debug handler). This is good for more detailed monitoring of censorship events. Even though we collect bridge usage metrics, collecting broker usage metrics will narrow down where the censorship is happening. > - country stats of domain-fronted client connections (logged, most recent snapshot at broker /debug) > - the roundtrip time it takes for a client to connect to get a snowflake proxy answer (available at broker /debug) Should we be already archiving this data? > Some of the metrics mentioned above will be easier to implement than others. The best place to collect statistics is at the broker, but some of the data mentioned would require proxies to report metrics to the broker for collection. We have to be a bit careful with this since anyone can run a proxy. It will also impact the decisions we make for #29207. We collect a lot of statistics at relays and bridges, which anyone can run. We are working on methods of improving robustness against these statistics being manipulated, but so far have not detected anyone reporting values that are not normal. It is good to have criteria for determining, based on stats others report, what you would be expecting so that anomalies can be detected. For example, we would expect relay bandwidth usage among relays to be proportional to consensus weight. > > I would also be interested in stats about users and usage (including e.g. number of users being handled divided by number of snowflakes handling them) > > This is a bit tricky. The broker knows which proxies it hands out the users but doesn't know the state of the clients' connections to those proxies (e.g., when they have been closed). It's also worth noting that different "types" of proxies (standalone vs. browser-based) can handle a different amount of users at once. Perhaps a more useful metric would be for snowflake proxies to advertize to the broker how many available "slots/tokens" they have when they poll for clients. This could be added to the broker--proxy WebSocket protocol. It would also avoid collecting more data on clients which is generally safer This sounds like a reasonable approach. You might want to take a look at: * https://research.torproject.org/techreports/countingusers-2010-11-30.pdf * https://research.torproject.org/techreports/counting-daily-bridge- users-2012-10-24.pdf This will give you an idea of how we do this for other parts of Tor. > > how many times are you giving snowflakes out? How many times did you stop giving a snowflake out because you've given it out so many times already? These questions tie into the address distribution algorithm question Can this also be an indirect measurement of number of users? > The above comment addresses this as well. The broker doesn't really decide whether or not they've given a snowflake out too many times. I think more important to deciding whether we are giving out proxies in a good way is to try to measure how "reliable" individual proxies have been in the past. This is related to setting up persistent identifiers (#29260). For relays, directory authorities track the mean time between failures, and we track this in Tor Metrics too. > It might also be interesting to have some kind of proxy diversity metric (e.g., whether 90% of all connections are handled by the same proxy). We can get some idea with persistent identifiers (#29260), but of course using a persistent identifier will always be optional. We can also do collection of geoip country stats of proxies. We don't really have this metric for relays yet, so if you have ideas that would be applicable to relays too then that would be great. We know about country/AS distribution, but we haven't quantified the diversity using any particular formula. > - Log all of the statistics in a reasonable format This would ideally be a format that Tor Metrics is already handling. If it could be based on the Tor directory protocol meta-format (ยง1.2 dir-spec) then that would be great. We don't want to bring in dependencies for parsing yaml/toml/etc. if we can help it. > - coordinate with the metrics team to get these metrics collected and visualized somewhere Please also coordinate on what you want to collect, so we can consider if that information already comes from somewhere, if we already had a plan for it, and if it is safe or not. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21315#comment:9> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs