#29734: Broker should receive country stats information from Proxy and Client -------------------------------------+-------------------------------- Reporter: cohosh | Owner: cohosh Type: enhancement | Status: needs_revision Priority: Medium | Milestone: Component: Obfuscation/Snowflake | Version: Severity: Normal | Resolution: Keywords: snowflake, geoip, stats | Actual Points: 2 Parent ID: #29207 | Points: 1 Reviewer: ahf | Sponsor: Sponsor19 -------------------------------------+-------------------------------- Changes (by dcf):
* status: needs_review => needs_revision Comment: I have just a few further changes to recommend. * https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/broker.go#L221 {{{ strings.Split(r.RemoteAddr, ":")[0] }}} This will fail for IPv6 addresses. Better to use net.SplitHostPort and check the error return. The host–port splitting could also happen inside of `Metrics.UpdateCountryStats`. * https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/geoip.go#L196 {{{ _, err = io.WriteString(hash, scanner.Text()) }}} A better way to do this may be to do `hashedFile := io.TeeReader(geoipFile, hash)` and then `scanner := bufio.NewScanner(hashedFile)` so that the hash is calculated as a side effect of reading the file. There's no need to handle errors when writing to a `hash.Hash` because it is documented to never error. * https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/geoip.go#L14 {{{ Recognized line formats for IPv4 are: INTIPLOW,INTIPHIGH,CC and "INTIPLOW","INTIPHIGH","CC","CC3","COUNTRY NAME" }}} It looks like the code doesn't recognize the 5-element syntax, so it should be omitted here, or, if it's a common format, documented as not being supported. Replying to [comment:22 cohosh]: > Replying to [comment:18 dcf]: > > * Is there any way to disable the geoip feature, if I don't have the necessary files, other than by providing an invalid path? Maybe we don't care to provide a way to disable the feature and it's the operator's responsibility to provide some kind of geoip files; but in that case, it may be better to exit with an error if the files cannot be read, rather than logging the error and continuing on. It is perhaps surprising that you can explicitly ask for a certain file on the command line with `-geoipdb path`, and the broker will run even if the path cannot be read. > > * It looks to me like if a geoip lookup fails, it is silently ignored—this could be misleading. It would be good to distinguish three cases: geoip file present and lookup found a country code; geoip file present but lookup found nothing; geoip file absent so no lookup is possible. > Added a -disablegeoip option that will not load geoip tables and will not return an error if UpdateCountryStats fails. Otherwise UpdateCountryStats will return an error if it fails and loading the table will panic if the file does not exist or if it is incorrectly formatted. Okay. Let's replace the panic with a `log.Fatal` so the failure gets logged. > > * The `parseEntry` functions need error checking on `geoipStringToIP` and `net.ParseIP`, or else they will store a `nil` that will result in a panic later on. [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L93 This error] will be more useful if it includes a line number. Let me suggest a different factoring of the geoip parsing code. Have `parseEntry` be a plain function, not a method on `GeoIPv4Table`, and have it return `(GeoIPEntry, error)` rather than inserting into the table itself. GeoIPLoadFile can do the entry insertion, count lines and annotate the error result from `parseEntry`. > The difficulty in refactoring it this way is that ipv4 and ipv6 tables have differently formated entries that need to be parse differently. > I added error checking to parseEntry and have it return `(GeoIPEntry, error)` as suggested, but leave it as a method on GeoIPv4Table and GeoIPv6Table. I didn't explain this well. I meant that there should be separate functions for the two formats e.g. `parseEntryIPv4` and `parseEntryIPv6`. The existing `parseEntry` methods never refer to `table`, so they don't need to be methods. But leaving them as methods is fine too. IMO annotating the error message with the problematic line should be done in `GeoIPLoadFile`, not in `parseEntry`. This will eliminate the duplication of the common `"Provided geoip file is incorrectly formatted"` string and ensure that all the error paths get annotated. What I mean is, in the `scanner.Scan()` loop, you can replace `return err` with `return fmt.Errorf("Provided geoip file is incorrectly formatted: %s. Line is: %+q")`. I was actually thinking of annotating with a line number, but including the literal contents of the line works as well. I suggest using the `%+q` format specififer instead of `%s`, because it will escape any weird bytes and ensure the output remains on one log line. > > * If `NewMetrics` is called more than once, there will be multiple open file handles to metrics.log (and only the first one will actually work, I think). Is it possible that `NewMetrics` could be called more than once? > Making a singleton *Metrics variable causes problems with how Convey does tests. It shouldn't be called more than once, but for now I'm using sync.Once on the logging at least so it's explicit Okay. > > * [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker /snowflake-broker_test.go#L274-L279 snowflake-broker_test.go]: I'm not crazy about having the tests depend on external files, especially ones that may change. Ideally the test code would include its own small geoip files, either as separate files or as literals in the source code. But at least, the test should [https://golang.org/pkg/testing/#hdr-Skipping t.Skip] if a file is missing, rather than report spurious failure. I suggest adding a layer of abstraction: a `GeoIPLoad(io.Reader)` that parses any `io.Reader` (for examples a `bytes.Reader`), and write `GeoIPLoadFile` in terms of it. > > * It would be nice to also have tests that exercise the error paths in geoip parsing. > > I haven't made this abstraction at the moment and went for the small geoip files option. Okay. > > * I would like to see unit tests for basic functions like `GeoIPRangeClosure` and `GeoIPCheckRange`. They don't have to be long, just test the edge conditions of an instance. > I replaced these functions with much simpler one line statements, but I added edge cases to the existing geoip test to make sure we correctly categorize IPs at the edge of a range Okay. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29734#comment:24> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs