So after all the anguish about the rsync protocol, we now have a suggested replacement. But there's been very little comment.
Tim did a thorough job of describing the design goals. It would be useful if the wg paid attention to this. --Sandy, speaking as one of the wg co-chairs On Feb 17, 2015, at 12:54 PM, Tim Bruijnzeels <t...@ripe.net> wrote: > Hi all, > > Following working group adoption I submitted the latest version of the delta > protocol document as a working group item: > >> https://datatracker.ietf.org/doc/draft-ietf-sidr-delta-protocol/ > > Sriram, allow me to get back on previous comments you made during the call > for adoption: > >> When authors spin a WG draft version (assuming it would be accepted as a WG >> draft), >> it would be good if the following suggestions can be given consideration: > > The current version is unchanged except for its name and number (00). But of > course we can give consideration to these and other suggestions for a next > version. > >> 1. Include one short paragraph just to discuss key disadvantages of rsync and >> how the delta protocol avoids or overcomes the same. > > I would like to avoid general discussion that may include opinions about how > severe perceived disadvantages of rsync are. Instead I would like to focus on > a more positive, and factual message, of what we are trying to achieve with > this protocol, and why. > > The last paragraph of the introduction has some text on this: > > This protocol is designed to be consistent with the publication > protocol [I-D.ietf-sidr-publication] and treats publication events of > one or more repository objects as immutable events that can be > communicated to relying parties. This approach helps to minimize the > amount of data that traverses the network and thus helps minimize the > amount of time until repository convergence occurs. This protocol > also provides a standards based way to obtain consistent, point in > time views of a single repository eliminating a number of consistency > related issues. Finally, this approach allows for caching > infrastructure to be used to serve this immutable data, and thus > helps to reduce the load on a publication server when a large a > number of relying parties are querying it. > > But admittedly this is incomplete and lacks explanation for why we believe > these are good things to have. > > I am happy to elaborate more on this here, and if it's useful to include in > the document itself we can add more text.. > > Design goals and benefits that I see (did not check everything with my > co-authors, so will not speak for them..): > > > = Based on publication protocol > > Not the most important design goal, but useful because this way we do not > need to reinvent all of the data structure. We can re-use the <publish> and > <withdraw> elements that have already been defined. Furthermore this may be > easier for publication servers - they can re-use update messages from > Certification Authorities in update messages with minimal effort. > > = Minimise data transfer > > We don't want to waste bits.. this is nothing against rsync, rsync is > actually very good at this. We just thought it would be good if we didn't > waste bits here either. > > = Point in time views > > This actually refers to a problem with rsync. Objects may be republished > mid-transfer and this can lead to things like getting a new CRL, but an old > manifest - and then the hash of the CRL doesn't match and the MFT EE may be > revoked. Clients can keep trying until they get something that seems > consistent, but.. if a Certification Server just sends all its updates (CRL, > MFT, ROAs etc) in one message to a publication server, then all of this can > be served as one delta and a lot of this goes away. > > = Caching infrastructure / CDNs (and immutable data) > > There are many different http caching servers that can be used to deal with > the load of serving static data to a large number of clients. There are also > many commercial Global Content Delivery Networks (CDNs) that can be used to > improve this further - that can operate for some time even if the back-end > system is unavailable, can spread the load further, and reduce latency to > globally distributed clients. > > Of course it is technically possible to build a CDN infrastructure with > rsync, using anycast or DNS tricks etc etc, but this is far from a trivial > effort. It's expensive to do, and easy to mess up, where the http based CDNs > and caching servers have had years of battle testing in the internet industry. > > = Shift load to clients to support scaling > > Not mentioned yet in the introduction. > > There is an asymmetry between the number of clients (relying parties) and > servers. With rsync the server needs to invest effort (CPU and memory) in a > dialogue with the client in order to work out what the actually delta is that > the client needs. The proportion of this effort spent by the server in this > dialogue is a limiting factor on how many clients can be served. > > Of course we can add more servers to counter this, but there is a lot more > gain to be had if we can minimise the effort that the server actually has to > invest. > > For this reason the protocol shifts almost all the (CPU) work to the relying > party. The server creates a notification file that refers to static snapshot > and possible delta files. The RP can work out what they need to get all on > their own. > > Serving this static data may still have bottlenecks with regards to network > usage and server memory, but this is where the http caching servers and CDNs > come in very helpful. > > > = Only support what's needed / protocol stability > > Also not mentioned yet, but I remember hallway discussions where people > mentioned things like: versioning… why don't you just use git/svn? > > This seems overkill to me because these tools (like rsync) provide many > options that we don't need here. Also, which version of git/svn, or rsync, > are we talking about? > > I think it would be best to have the protocol stripped down to only the bare > essentials, and version it clearly. > > > = Availability of open-source libraries for transport http > > There are numerous open source http libraries available to use for RP > software in every major programming language. Helps with performance (no need > to fork a process), testing, and it reduces the risk of the system as a whole > of pretty much relying on a single implementation. > > > = Transport Protocol agnostic? > > Although the protocol clearly relies on http caching, it was a design > decision to repeat relevant data such as session ids, and versions, in all > files (rather than making this implicit in their location). The reason is > that this will make it easier to share these files using other transport or > sharing protocols. > > >> 2. Possible to say something about the relevance of or comparison with >> Aspera (since they make big performance improvement claims over rsync etc.)? >> http://asperasoft.com/resources/benchmarks/ >> http://asperasoft.com/performance-calculator/ > > This seems to be an option as a transport protocol, but not as a complete > delta protocol - i.e. helping the client figure out what to get. > > This may be quicker than http, but I am not convinced mainly because it's a > proprietary service. We would depend on a single vendor to support the > transport protocol. > > > > > _______________________________________________ > sidr mailing list > sidr@ietf.org > https://www.ietf.org/mailman/listinfo/sidr
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ sidr mailing list sidr@ietf.org https://www.ietf.org/mailman/listinfo/sidr