- I filed bug 15178 for Atlas [0]. - I put more code and thought into the Sybil detector [1]. One aspect of it is to find a distance metric that can quantify the similarity between two given relay descriptors. Once we have that, we can use nearest neighbour search algorithms to find the k relays that are the most similar to a given relay descriptor.
So far, I experimented with the Levenshtein distance [2] as distance metric and a vantage point tree [3] as nearest neighbour search algorithm. My current application of the Levenshtein distance is not very smart as it simply takes two raw relay descriptors as input and determines their distance. The distance is the amount of string manipulations necessary to turn descriptor A into descriptor B. I'm currently looking at ways to preprocess the relay descriptors so we can incorporate our experience with past Sybils instead of just treating them as opaque string blurbs. The challenging part is that the result still has to be a metric in the mathematical sense that satisfies a number of properties. [0] <https://bugs.torproject.org/15178> [1] <http://notebooks.nymity.ch/detecting_sybils.html> [2] <https://en.wikipedia.org/w/index.php?title=Levenshtein_distance&oldid=654093637> [3] <https://en.wikipedia.org/w/index.php?title=Vantage-point_tree&oldid=643855307> Cheers, Philipp _______________________________________________ tor-reports mailing list tor-reports@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-reports