#29734: Broker should receive country stats information from Proxy and Client -------------------------------------+----------------------------- Reporter: cohosh | Owner: cohosh Type: enhancement | Status: merge_ready Priority: Medium | Milestone: Component: Obfuscation/Snowflake | Version: Severity: Normal | Resolution: Keywords: snowflake, geoip, stats | Actual Points: 2 Parent ID: #29207 | Points: 1 Reviewer: ahf | Sponsor: Sponsor19 -------------------------------------+-----------------------------
Comment (by dcf): I have many comments, but overall my impression is good and I think you can move ahead with this. My big-picture question is: what do we plan to do with this data? If it's to detect blocking events by comparing the broker's statistics against the bridge's, I think we should at least sketch out those analysis scripts, in order to see whether the client geoip data we will be collecting is suited to the requirements. My main point is that we shouldn't collect data just because it may be useful; instead we should design safe data collection around some question we want to answer. As it stands, the branch will collect more precise client data than Tor Metrics does (Tor Metrics doesn't publish raw numbers but applies some fuzzing and binning). Having /debug display precise counts is a danger in the following scenario: an observer wants to determine whether a particular client is accessing the Snowflake broker. Whenever the observer sees a suspected connection, it checks the /debug output to see whether the count has incremented. Perhaps we could do a test deployment for a few days, to get an idea of what the data looks like. In fact, I think it's a good idea to try that, before merging. If there's a research question that we think this data could help us answer, we could ask the [https://research.torproject.org/safetyboard.html Safety Board] to evaluate it. ---- * This branch only tracks client accesses, not proxy accesses. What will happen with proxy accesses, are they planned to go in the same metrics.log file? * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/broker.go#L271 broker.go]: The SIGHUP handler introduces a race condition because other goroutines may be using the tables while they are being replaced. Even though the table is replaced [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L64 all at once with an assignment], I don't think that even that assignment is guaranteed to be atomic. The table really needs a lock—not only while being reloaded from a file but on every access (because SIGHUP means the data structure can change at any time). Alternately, consider removing the SIGHUP feature and only loading the database once at startup. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/broker.go#L243 broker.go] {{{ flag.StringVar(&geoipDatabase, "geoipdb", "/usr/share/tor/geoip", "path to correctly formatted geoip database mapping IPv4 address ranges to country codes") flag.StringVar(&geoip6Database, "geoip6db", "/usr/share/tor/geoip6", "path to correctly formatted geoip database mapping IPv6 address ranges to country codes") }}} Is there any way to disable the geoip feature, if I don't have the necessary files, other than by providing an invalid path? Maybe we don't care to provide a way to disable the feature and it's the operator's responsibility to provide some kind of geoip files; but in that case, it may be better to exit with an error if the files cannot be read, rather than logging the error and continuing on. It is perhaps surprising that you can explicitly ask for a certain file on the command line with `-geoipdb path`, and the broker will run even if the path cannot be read. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L47 UpdateCountryStats]: It looks to me like if a geoip lookup fails, it is silently ignored—this could be misleading. It would be good to distinguish three cases: geoip file present and lookup found a country code; geoip file present but lookup found nothing; geoip file absent so no lookup is possible. * Adding to that, tor records the hashes of its geoip files: [https://gitweb.torproject.org/torspec.git/tree/dir- spec.txt?id=6e58fa857a95e18c5fda0ecf17955da7919b260b#n855 geoip-db-digest] and [https://gitweb.torproject.org/torspec.git/tree/dir- spec.txt?id=6e58fa857a95e18c5fda0ecf17955da7919b260b#n861 geoip6-db- digest]. It would be good to record that information so we can retroactively determine if geoip files were out of date, or diagnose geoip failures. * I think that some of the logging is happening at too low a level. The most basic functions should return errors and not log. For example [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L68 geoipStringToIP] should just return the `err` from `strconv.ParseUint`. [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L78 NewMetrics], too, should return any error resulting from filesystem operations. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L164 GeoIPRangeClosure]: I don't understand why this is called "closure"; it's just a function. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L15 geoip.go] [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L67 geoip.go] "4-byte unsigned integers": Please document whether it's big- endian or little-endian. "INTIPLOW,INTIPHIGH,CC": Similarly, document whether the ranges are inclusive of the high element or exclusive (I guess it would have to be inclusive, in order to represent 255.255.255.255?). * The `parseEntry` functions need error checking on `geoipStringToIP` and `net.ParseIP`, or else they will store a `nil` that will result in a panic later on. [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L93 This error] will be more useful if it includes a line number. Let me suggest a different factoring of the geoip parsing code. Have `parseEntry` be a plain function, not a method on `GeoIPv4Table`, and have it return `(GeoIPEntry, error)` rather than inserting into the table itself. GeoIPLoadFile can do the entry insertion, count lines and annotate the error result from `parseEntry`. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L148 geoip.go]: Check [https://golang.org/pkg/bufio/#Scanner.Err scanner.Err()] after the loop. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L53 GeoipError]: Unless you need to do type matching on errors, you can just use `errors.New` or `fmt.Errorf` instead of making a new type. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L192 GetCountryByAddr]: I think a more natural API for this function would be to have it take a `net.IP` instead of a `string` (seeing as all it does with the string is immediately parse it, and [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L29 UpdateCountryStats] is already parsing it). * Can [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L164 GeoIPRangeClosure] be written as `bytes.Compare(key.To16(), entry.ipHigh.To16()) <= 0`, and [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L177 GeoIPCheckRange] as `bytes.Compare(key.To16(), entry.ipLow.To16()) >= 0 && bytes.Compare(key.To16(), entry.ipHigh.To16()) <= 0`? * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L48 metrics.go] {{{ m.countryStats.counts[country] = m.countryStats.counts[country] + 1 }}} You can do this as `m.countryStats.counts[country]++` or `m.countryStats.counts[country] += 1`. * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L63-L64 metrics.go]: I don't think the initial `nil` assigment does anything. {{{ m.tablev4 = nil // garbage collect previous table, if applicable m.tablev4 = tablev4 }}} * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L97-L98 metrics.go] {{{ for { <-heartbeat }}} Consider doing this as `for range heartbeat {`. That way, the loop will terminate if the `heartbeat` channel gets closed. * If `NewMetrics` is called more than once, there will be multiple open file handles to metrics.log (and only the first one will actually work, I think). Is it possible that `NewMetrics` could be called more than once? * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker /snowflake-broker_test.go#L274-L279 snowflake-broker_test.go]: I'm not crazy about having the tests depend on external files, especially ones that may change. Ideally the test code would include its own small geoip files, either as separate files or as literals in the source code. But at least, the test should [https://golang.org/pkg/testing/#hdr-Skipping t.Skip] if a file is missing, rather than report spurious failure. I suggest adding a layer of abstraction: a `GeoIPLoad(io.Reader)` that parses any `io.Reader` (for examples a `bytes.Reader`), and write `GeoIPLoadFile` in terms of it. * It would be nice to also have tests that exercise the error paths in geoip parsing. * I would like to see unit tests for basic functions like `GeoIPRangeClosure` and `GeoIPCheckRange`. They don't have to be long, just test the edge conditions of an instance. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29734#comment:18> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs