#32265: MS: Format an exit list from a previous exit list and exitmap output ----------------------------------+------------------------------ Reporter: irl | Owner: irl Type: task | Status: needs_review Priority: Medium | Milestone: Component: Metrics/Exit Scanner | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: #29654 | Points: Reviewer: karsten | Sponsor: ----------------------------------+------------------------------
Comment (by irl): Replying to [comment:8 karsten]: > Actually, I think it's harmful to download exit lists from CollecTor and merging them with the scanner's own measurements. We should instead merge new scan results with previous local results. It's also yet another dependency to download something from CollecTor that is not really needed. I'd say kill this code. Ok, it's gone. > Well, the spec says what these fields are being used for: `Published` is used to skip relays that haven't published a new descriptor since the one in the current consensus, and `LastStatus` is used to know when to throw out relays from the list. This is all under the assumption that the scanner reads its previous exit list from disk before making measurements. > > My suggestion would be to use the consensus valid-after time as `LastStatus` time. It's pretty much the same as the `published` time in a version 2 status, and it would work for this purpose. I saw what TorDNSEL is using it for, but I wonder if people use exit lists in ways we haven't anticipated. I guess we can synthesize the valid after time from the measurement time, but our plugin is not directly handling consensuses or server descriptors. It would take changes to exitmap internals to get this data out. > > No. It scans the entire network every time. It does this asynchronously, and doesn't try to prioritize anything. Just whichever circuits are built first will be tested first. I was even thinking it could run continuously. If exit relays cannot cope with two HTTP requests an hour, perhaps they shouldn't be exit relays. > > Ideally, we would change as few variables at the same time as possible, in order to compare the new results with the old ones. Changing the scheduling from "only scan relays with changed descriptors" to "scan all relays once per hour" seems like a major design change that we could make at a later time. This could add a lot of time to the project. The exitmap architecture doesn't really have a way to do this, so it would take changes to the internals there. I guess we can perform the measurements and then throw them away as a shortcut option, but once we've done the measurement anyway that seems wasteful. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32265#comment:9> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs