#20228: Append all votes with same valid-after time to a single file in `recent/` -------------------------------+--------------------- Reporter: karsten | Owner: Type: enhancement | Status: new Priority: Medium | Milestone: Component: Metrics/CollecTor | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -------------------------------+---------------------
Comment (by karsten): Replying to [comment:3 iwakeh]: > * Regarding grouping by download vs. published time which came up in #20234, too. > Let's have the discussion for all descriptors here, if this is ok? > 1. Grouping by published time brings more data consistency between CollecTor instances, as their download times for the same descriptors surely differ often. Agreed, I guess we can assume that files in the `recent/` directories might differ between CollecTor instances. But is that important, as long as the set of contained descriptors with publication time in the past, say, 60 hours is 99.9% the same? I mean, it's still possible and very likely that files by publication hour would contain descriptors in different orders. Do we care? > 2. Grouping by download time means keeping track of a data item, i.e. download time, that so far is not part of the Tor protocol. Why introduce it for descriptors that provide a published time? Which is the download time after syncing descriptors: the initial download by the supplying CollecTor or the sync-download-time by the receiving one? Right now, a CollecTor instance records the timestamp when starting to download and uses that as file name for the descriptors file where it appends all descriptors it learns about in that run. That would include descriptors found via initial download or via synchronization from other instances. And 72 hours later, when the file gets deleted, the download time will not be relevant anymore. > 3. Regarding #20234:comment:5: Clients might not be interested in past or future (according published time) descriptors and just download the file they consider current, if it changed since their last visit. Right, this is an important argument for storing descriptors by published hour, so that clients can retrieve them easily. However, the presumption there is that the client knows the publication time of a descriptor before downloading something, and that's not always the case. It might be that the client would have to download several files and search for the descriptor it's looking for. And the most important argument against storing descriptors by published hour is that clients that just want the new descriptors will have to download about 8 files per hour (due to #20234) rather than 1, where 6 or 7 of these files contain mostly the same descriptors as before. > * Regarding the notice: I think the two week time frame is fine. Sounds good. Let's first conclude on something here and then tell the world. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/20228#comment:4> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online _______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs