#18910: distributing descriptors accross CollecTor instances -------------------------------+----------------------------------- Reporter: iwakeh | Owner: iwakeh Type: enhancement | Status: needs_information Priority: High | Milestone: CollecTor 1.1.0 Component: Metrics/CollecTor | Version: Severity: Normal | Resolution: Keywords: ctip | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -------------------------------+-----------------------------------
Comment (by iwakeh): Thanks for the remarks and suggestions! I'm replying inline below and also add a wiki page [wiki:doc/CollecTor/DescriptorDistribution CollecTor Sync] that contains the current status of the discussion. Please, take a look there to see the entire picture. Replying to [comment:14 karsten]: > Hmm, the suggested config options would imply that there's only one new sync manager module that syncs all descriptors from the various sources and that runs, say, once per hour? I wonder how to schedule that in a way that it does not interfere with the other modules. So far, modules were pretty much independent, but this new module would create a dependency between modules. You're right, they should stay independent. I intended that, too, but I had a different (more complicated) architecture in mind. > > Alternative suggestion: we add four (sets of) configurations, one for each module, that internally re-use the same code for syncing descriptors and for importing them. For example, `SyncRelayDescriptors`, `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`. Good idea! So we run the sync-function after or instead of the module run (see wiki page for more). > We could then provide a remote path where to find descriptor files (like `/recent/relay-descriptors/`) and could implictly only consider descriptor types that the respective module understands (like `RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not `BridgeServerDescriptor`). Actually, the directory structure of a CollecTor's 'recent' is given, i.e. the different mirrors won't or shouldn't use a different directory sructure than the main instance. So, it suffices to activate the module and set the sync or sync-only option. The path structure for the actual download is determined. The straightforward paths for torperf and exitlists and the more complex structure for bridge- and relay- descriptors. > > Here's a potential policy we could apply to decided whether to keep a local or remote descriptor: while syncing, if we find out that a remotely obtained descriptor would be stored under a file name that already exists locally, we always discard that;... So, //while syncing// means while retrieving descriptors from a different instance and writing them to the local `SyncFolder` structure. And, during this process descriptors already available in the sync-folder are not replaced. > ... and while processing descriptors locally, if we find that we already have a file locally with different content, which we likely received while syncing, we always overwrite that. This means that we're only adding data but never replacing data. This refers to the process of comparing the descriptors fetched from remote instances with descriptors already in the 'recent' folder of the syncing instance? Such local descriptors could have been obtained by direct download or a different syncing operation. Did I miss something here? > > Regarding deleting synced descriptors, we should never do that, but we should rather let `DescriptorCollector` clean up the local directory when it finds that a local file does not exist anymore remotely. True, if this refers to descriptors in the SyncFolder. > > Here's something else to watch out for while writing this code: whenever we learn descriptors from syncing, we'll have to include them in our `/recent/` directory, too. This wasn't entirely clear to me from the description above, so if this was already the plan, never mind. That was intended, but should be clearly stated; will be added to the wiki page. Hope I don't see things too complicated. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:15> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online _______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs