On Fri, Mar 20, 2020 at 5:09 PM teor <t...@riseup.net> wrote: > > Hi Nick, > > On 21 Mar 2020, at 05:38, Nick Mathewson <ni...@torproject.org> wrote: > > Walking Onions: week 3 update > > As you might recall, the SNIPs need to be nice and small, but they > need to be completely self-contained. The ENDIVE, on the other > hand, needs to be diff-friendly, and compressible. Relays need to > derive all the SNIPs from the ENDIVE. The ENDIVE and SNIP formats > that I worked on during last week [SNIPFMT] was designed to meet these > criteria. > > This week, I wrote up the algorithms to extract signed SNIPs from an > ENDIVE. This is in section 4 [RECONST] of the specs in my > work-in-progress repository [GITREPO] This was mainly straightforward > consequence from the work I did last week, but there were a few > things that turned up. > > One trickier aspect of this work is that we want to move most of the > decision-making to the authorities, and have the relays simply do an > expansion algorithm. Updating all the relays would be slow, so relays > need to be able to handle new index types and data types without > fully understanding them. That means that we need to specify an > algorithm that can work flexibly with many future kinds of data. > > Another tricky aspect is that the data which the relays need to put > into SNIPs all needs to be signed, so all of that data needs to come > out the same when relays calculate it as the when the authorities > calculate it. Thus, all the algorithms for building Merkle trees and > indices and SNIPRouterData bodies need to be deterministic. > > I think that I got all of this relatively right in [RECONST], but > there are probably some mistakes hiding in there. > > I'm hoping that there is a cleaner alternative to the weighted-index > reconstruction algorithm -- the one that's there right now puts a > lot of requirements on the input data, in order to avoid > floating-point operations. > > > There's a shorter encoding for Raw: > > for each [i, pos1, pos2] in index_ranges: > w = pos2 - pos1 > j = the index of pos1 among all sorted pos1s > new_encoding = [i, j, w]
This will work when w is an integer that we can work on additively, but not so well when we're talking about something more like a hash ring. I think it makes sense to have two different possible encodings here, though. I'm calling this one "RawNumeric. > [i, j, w] is an efficient encoding if index_ranges is sparse compared > with ENDIVERouterData, because: > * j has the same cardinality as I > * w is smaller than pos1 and pos2 > > If index_ranges is dense, there may be a more efficient encoding: > add missing i with weight 0 > drop j > With this encoding, you can drop a few of the constraints. > There's a faster calculation and more efficient encoding for > Weighted, because there's a common factor in each POS() > calculation: > (1 << 32) / total_weight > > If we remove that factor, we end up with a much simpler algorithm, > that's also more flexible: > > Algorithm: Expanding a "Weighted" indexspec. > > > Each weighted indexspec also has a multiplier, which may vary > > between indexspecs. > > > Let total_weight = SUM(indexspec.index_weights) > > Verify total_weight * multiplier <= UINT64_MAX. > > > Let total_so_far = 0. > > Let result_idx = {} (an empty mapping). > > Define POS(b) = b * multiplier > > > For 0 <= i < LEN(indexspec.indexweights): > > Let w = indexspec.indexweights[i]. > > Let lo = POS(total_so_far). > > Let total_so_far = total_so_far + w. > > Let hi = POS(total_so_far) - 1. > > Append (lo, hi) => i to result_idx. > > > Return result_idx. I'm not so sure about this one -- see the note below about the total value. > If multiplier is large, then the weights can be quite small. > > (Small weights sacrifice some accuracy.) And if the multiplier > > is 1, you effectively have a "Raw" index. > > > If you make those changes, you should be able to use a similar > process to expand all the different index types. (After multiplying, > truncating, or hashing, you either end up with a delta, or an > absolute position. You can turn deltas into absolute positions, > and then feed them all into the same base algorithm.) > > There are also a few things I think might be bugs: > > Was there meant to be a constraint that the Weighted total is > UINT64_MAX? Or close to UINT64_MAX? The actual intention is that it has to be " = UINT32_MAX", not" <= UINT32_MAX". Remember that the the client picks its next relay by specifying a random index value in range 0..UINT32_MAX, and it expects that the relay to that it is extending to _will_ have a snip that covers that range. > The fixed parameters don't make much sense otherwise. > > > I think v2 and v3 onion services assign descriptors to the next > highest HSDir RSA id (or ed25519 hash). But > INDEX_FROM_RING_KEYS() uses the relay input as the lowest value. > > There is no next member for the last member in > INDEX_FROM_RING_KEYS(), but the code asks for one. > (Perhaps there are some typos here that make the code hard to > understand.) Okay, I'll have to look into this. I think you're probably right, and you _are_ right about the ring wrapping issues. > We'll need special treatment for ring wrapping. (That is, 0xFF… is not > a real upper limit, it actually means, "use the lowest relay".) > > It's weird to call a middle value a "suffix", but "infix" is also a bit of an > unusual word. > > There are also a bunch of typos, let me know when you're ready for > copy-editing. Once I'm through with sections 3 and 5, I'm going to do a big revision and consistency check of everything that's there before I move on to sections 6-8. Once that's done I hope that a copy edit would be more helpful. -- Nick _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev