Hi, all. There has been a lot of chatter re possible improvements to SKS on the list lately, and lots of ideas thrown around. So I thought I'd summarise the proposals here, and try to separate them out into digestible chunks.
I've ordered them from less to more controversial. My personal preference is for the first two sections (resiliency, type filters) to be implemented, and for the rest to be parked. This has turned out to be a much longer document than I expected. I don't intend to spend any further time or energy on local blacklisting, as its technical complexity increases every time I think about it, and its politics and effectiveness are questionable. A. Concrete proposals ================== Version 1.X: Resiliency ----------------------- These are ideas that fell out of the other discussions, but are applicable independently. If we want to make backwards-incompatible changes, then automatic verification of status, versions etc will probably be necessary to prevent recon failure. ### JSON status A standardised JSON status page could be served by all SKS-speaking services. This would ease fault detection and pool management, and is a prerequisite for reliable sanity callbacks. ### Default initial_stat=true Also a prerequisite for sanity callbacks. Otherwise useful for debugging and fault detection. ### Sanity callbacks Currently, a server has no way to determine if its peers are correctly set up or have key deltas within the recon limit. If each host served a JSON status page, peers could perform a sanity check against it before allowing recon to continue. This would help contain the effects of some of the more common failure modes. ### Empty db protection If the number of keys in the local database is less than a configured threshold, an sks server should disable recon, and throw a warning. The particular threshold could be set in the conf file, and a sensible default provided in the distro. This should prevent new servers from attempting recon until a reasonable dump is loaded. Version 2.0: Type filters with version ratchet ---------------------------------------------- This proposal seems to have the most support in principle. It is relatively easy to implement, and directly addresses both illegal content and database bloat. It does however require precise choreography. It should be possible to alter the sks code during a version bump so that: 1. All objects of an expanded but hardcoded set of types (private keys, localsigs, photo IDs, ...) are silently dropped if submitted 2. Any existing objects in the database of these types are treated as nonexistent for all operations (queries, recon, dumps, ...) 3. The above rules are only enabled on a future flag day, say 180 days after the release date of the new version 4. The version criterion for pool membership is bumped a few days in advance of the flag day 5. A crufty database could be cleaned by dumping and reloading the db locally, or a database cleaner could be run on a schedule from within SKS itself This would purge the pool of the most obviously objectionable content (child porn, copyrighted material), with minimal collateral damage. The disadvantage is that any noncompliant peer would fail recon after flag day due to excessive delta, and thus would need to be either depeered manually, or have its recon attempts denied by a sanity callback. Other implementations (i.e. hockeypuck) would have to move in lockstep or be depeered. Future speculation ================== Future A: Policy blacklisting ----------------------------- Pay attention, kid. This is where it gets complicated. Version ratchets may not be flexible or responsive enough to deal with specific legal issues. Policy-based blacklisting gives server operators a fine-grained tool to clean their databases of all sorts of content without having to move in lockstep with their peers. These proposals are more controversial, given that individual operators will have hands-on responsibility for managing policy, and thereby potentially be more exposed legally. It should be noted however that technical debt may not be a valid defence against legal liability. IANAL. All of the changes in this section must be made simultaneously, otherwise various forms of recon failure are inevitable. This will involve a major rewrite of the code, which may not be considered a good use of time. If type filters have been implemented (see above), the need for local policy would be considerably reduced. If however type filters were not used, then policy blacklists would be the main method for filtering objectionable content, which might be prohibitive. Note that locally-divergent blacklist policies have the potential to break eventual consistency across the graph (see below). ### Local blacklist An SKS server may maintain a local blacklist of hashes that it does not want to store. At submission time, any object found in the blacklist is silently dropped. Any requests for objects in the blacklist should return `310 Gone`. ### Local dumps When an SKS server is making a dump, it should dump all of its databases, including blacklist, peer_bl_cache and limbo (see below). This is useful for a) restoring state locally after a disaster, but also b) helping new servers bootstrap themselves to a low-delta state. ### Bootstrap limbo When restoring from a dump, a server may simply restore the dumped blacklist and continue. But if the new server has a different policy than the source, this is not sufficient. Hashes that were added to the original blacklist for violating policies that the new server does not enforce should not be blacklisted on the new server. But they cannot be added to the local database either, because the actual data will not be found in the dump. Instead, these hashes are added to a `limbo` database that will be progressively drained as and when the hashes are encountered again during submission or catchup. This is important to ensure that recon can start immediately with a complete set of hashes. Any requests for objects in limbo should return `404 Not Found`. If an object is successfully submitted or fetched that matches a hash in limbo, then the hash will be removed from limbo before the object is processed by policy. ### Peer blacklist cache When fetching new objects from a peer during catchup, the peer may throw `310 Gone` - if this happens then we know that the peer has blacklisted it and we should not request it again from that peer for some time. We store the triple `(hash, peer, timestamp)` in the database `peer_bl_cache`. Similarly, if we receive `404 Not Found` during catchup, then this object is in the remote server's limbo. We add it to `peer_bl_cache` as if it were a `310`. Cache invalidation should reap it eventually. ### Fake recon The recon algorithm is modified to operate against the set of unique hashes: ``` (SELECT hash FROM local_db) JOIN (SELECT hash FROM local_bl) JOIN (SELECT hash FROM limbo) JOIN (SELECT hash FROM peer_bl_cache WHERE peer="$PEER"); ``` This ensures that deltas are kept to a minimum. Note that this may cause the remote server to request items that it does not have but are in our blacklist or our limbo. This should only happen once, after which the offending hash should be stored in the peer's blacklist cache against our hostname. If the remote server requests an object that we have stored in our `peer_bl_cache` against its name, then our cache is obviously invalid and we should remove that entry from the cache and respond with our copy of the object, if we have one. ### Conditional catchup Instead of requesting the N missing hashes from the delta, the server will request the following hashes: ``` (SELECT hash FROM missing_hashes) JOIN (SELECT hash FROM peer_bl_cache WHERE peer="$PEER" ORDER BY timestamp LIMIT a) JOIN (SELECT hash FROM limbo LIMIT b*N); ``` where `a` is small, perhaps even a weighted random integer from (0,1), and `b` is O(1). These parameters will be adjusted so that a balance is maintained between (on one hand) timely cache invalidation and limbo draining; and (on the other) the impact upon the remote peer of excessive requests. ### Policy enforcement Each server would be able to define its own policy. The simplest policy would be one that bans certain packet types (e.g. photo IDs). During both catchup and submission (but after limbo draining), the new object is compared with local policy. If it offends then its hash is added to the local blacklist with a reference to the offending policy, and the data is silently dropped. Policy should be defined in a canonical form, so that a) local policy can be reported on the status pages and b) remote dumps can be compared with local policy to minimise the number of hashes that need to be placed in limbo during bootstrap. ### Local database cleaner If policy changes, there will in general be objects left behind in the db that violate the new policy. A cleaner routine should periodically walk the database and remove any offending objects, adding their hashes to the local blacklist as if they had been submitted. This could be implemented as an extension of the type-filter database cleaner above. Open problem: Eventual Consistency ---------------------------------- Any introduction of blacklists opens the possibility of "policy firewalls", where servers with permissive policies may be effectively isolated from each other if all of the recon pathways between them pass through servers with more restrictive policies. Policy would therefore not only prevent the storage of violating objects locally, but prevent their propagation across the network. The only way to break this firewall is to create a new recon pathway that bypasses it. This could be done manually, but this places responsibility on operators to understand the policies of all other servers on the graph. ### Recon for all It might be possible to move from a recon whitelist to a recon blacklist model. Servers would spider the graph to find peers and automatically try to peer with them. This would ensure that eventual consistency is obtained quickly, by maximising the core graph of servers that are mutually directly connected (and thus immune to firewalling). The main objection is that moving from a whitelist to blacklist recon model opens up a significant attack surface. Sanity callbacks could be used to mitigate against human error, but not sabotage. ### Hard core Alternatively, a group of servers that do not intend to introduce any policy restrictions could agree to remain mutually well-connected, and stay open to peering requests from all comers (subject to good behaviour). This would effectively operate as a clearing house for objects. The main objections are a) these servers must all operate in jurisdictions where the universality of their databases is legally sound (e.g. no right to be forgotten), and b) some animals would be more equal than others. Future B: Austerity ------------------- In an extreme scenario, handling of any user IDs may be impossible due to data protection regulations. On the same grounds, it may not even be possible to store third-party signatures as these leak relationship data. In such a case, it may still be possible to run an austere keyserver network for self-signatures (i.e. expiry dates) and revocations only. This would require a further version ratchet with a type filter permitting a minumum of packet types, shorn of all personal identifying information.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Sks-devel mailing list Sks-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/sks-devel