#23817: Tor re-tries directory mirrors that it knows are missing microdescriptors -------------------------------------------------+------------------------- Reporter: teor | Owner: (none) Type: defect | Status: new Priority: Medium | Milestone: Tor: | 0.3.3.x-final Component: Core Tor/Tor | Version: Severity: Normal | Resolution: Keywords: tor-guard, tor-hs, prop224, | Actual Points: 032-backport? 031-backport? | Parent ID: #21969 | Points: Reviewer: | Sponsor: -------------------------------------------------+-------------------------
Comment (by asn): Replying to [comment:10 teor]: > Replying to [comment:9 asn]: > > Replying to [comment:7 teor]: > > > Replying to [comment:6 asn]: > > > > Here is an implementation plan of the failure cache idea from comment:4 . > > > > > > > > First of all, the interface of the failure cache: > > > > > > > > We introduce a `digest256map_t *md_fetch_fail_cache` which maps the 256-bit md hash to a smartlist of dirguards thru which we failed to fetch the md. > > > > > > > > Now the code logic: > > > > > > > > 1) We populate `md_fetch_fail_cache` with dirguards in `dir_microdesc_download_failed()`. We remove them from the failure cache in `microdescs_add_to_cache()` when we successfuly add an md to the cache. > > > > > > Successfully add *that* md to the cache? > > > Or any md from that dirguard? > > > > > > > I meant, we remove *that* md from the `md_fetch_fail_cache` if we manage to fetch *that* md from any dir. > > > > > I think this is ok, as long as we ask for mds in large enough batches. > > > > > > > 2) We add another `entry_guard_restriction_t` restriction type in `guards_choose_dirguard()`. We currently have one restriction type which is designed to restrict guard nodes based on the exit node choice and its family. We want another type which uses a smartlist and restricts dirguards based on whether we have failed to fetch an md from that dirguard. We are gonna use this in step 3. > > > > > > > > 3) In `directory_get_from_dirserver()` we query the md failure cache and pass any results to `directory_pick_generic_dirserver()` and then to `guards_choose_dirguard()` which uses the new restriction type to block previously failed dirguards from being selected. > > > > > > Do we block dirguards that have failed to deliver an md from downloads of that md? > > > Or do we block dirguards that have failed to deliver any mds from downloads of any md? > > > > > > > Yes, that's a good question that I forgot to address in this proposal. > > > > I guess my design above was suggesting that we block dirguards "that have failed to deliver any mds from downloads of any md", until those mds get fetched from another dirserver and get removed from the failure cache. > > I think this is the behaviour we want: trying each dir server for each specific md will mean that we time out, because there are more mds than there are dir servers. > > There are two cases when this will cause us to fall back to the authorities: > * we download a new consensus from the authorities, and they are the only ones with some of the mds in it > * for some reason, an md referenced in the consensus is not mirrored correctly by relays or authorities (this would be a serious bug in tor) > > To avoid this happening when it isn't necessary, we should expire failure cache entries after a random time. Maybe it should be the time when we expect dir servers to fetch a new consensus and new mds. I think this is 1-2 hours, but check dir-spec for the exact details. Or we could expire md failure caches each time we get a new consensus. That would be easier. ACK. Based on our discussion above, I think the implementation can be simplified since we decided to go with the "block dirguards that have failed to deliver any mds from downloads of any md" approach: Hence, instead of keeping track of which mds we failed to download from which relays, we can just keep track of the relays that have outdated md information without caring about which mds they failed to serve us. This simplifies things a lot from an engineering perspective since we don't need to map mds to relays. We can just keep an `outdated_dirserver_list` smartlist with the list of outdated dirservers and ignore those when we try to fetch mds. This is also consistent with my empirical experience, where most dirservers will usually provide us with all the mds we asked them, except from a few dirservers which would refuse to give us 10 mds or so for a while. So we can just ban those dirservers for a while using the above logic. And I agree with you, we should wipe the `outdated_dirservers_list` list everytime we fetch a new consensus. Should we also clear it at any other point? Maybe we should not let it grow to a huge size (cap it?). I started implementing the original implementation plan today, before I realized the above simplification, so I will have to change some code to do this new design. I estimate that I will have an initial branch here early next week including unittests. Onwards! -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23817#comment:11> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs