#26022: Fix a flaw in the noise-removing code in our onion service statistics --------------------------------+------------------------------ Reporter: karsten | Owner: metrics-team Type: defect | Status: needs_review Priority: Medium | Milestone: Component: Metrics/Statistics | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: --------------------------------+------------------------------
Comment (by karsten): Replying to [comment:2 asn]: > Hey Karsten, Hi asn, > nice research! It's been a while since we looked at this stuff, so here are a > few questions as I try to understand what's going on: Thanks for looking into this! Those are good questions. Let me try to answer them: > a) I'm trying to understand the rounding step and it's significance. Do you remember what we meant by ''The idea is that it's most likely that noise was added to the closest right side of a bin than to the right side of another bin.''? The following is from my memory, not from looking at code: I ''think'' that, on relays, we're first binning the observed number and then adding noise. The noise distribution has a mean value of 0. So, even though we don't know how much noise was added, it's more likely that a small negative or small positive value was added. > b) Why do we think that the result of that division should be floored towards infinity? Is it because the graph looks weird? Floored towards ''negative'' infinity. The idea is that we're always subtracting something, rather than subtracting in case of positive numbers and adding in case of negative numbers. We're basically handling reported values spanning two bins exactly the same, just because they happen to be close to zero. This seems wrong. (To be honest, this is the first time that I'm even thinking about integer truncation going into two different "directions" depending on whether the input value is positive or negative. I could imagine that none of us had thought about this before.) > c) How did you produce that graph? I agree it doesn't look good. By generating sample values and feeding them into the code that removes noise. That's the "flawed" graph. And then another time into the fixed code which produces the "fixed" graph. > d) You say ''Right now, we're handling all negative reported values wrong by adding 1 binSize to them which we shouldn't do.''. Where do we do that, and why? Take the "fixed" graph and imagine you add 1024 to every value < -512. Then you have the "flawed" graph. I didn't mean to say that we're doing that for a reason. I meant that this is what we're doing right now, unknowingly, and that I believe that it's wrong. > Thanks! :) Thanks for digging into this! If the answers above make sense and you agree that this is a possible bug, I'll try to produce numbers using the fixed method. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:3> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs