So after all the anguish about the rsync protocol, we now have a suggested 
replacement.  But there's been very little comment.

Tim did a thorough job of describing the design goals.

It would be useful if the wg paid attention to this.

--Sandy, speaking as one of the wg co-chairs


On Feb 17, 2015, at 12:54 PM, Tim Bruijnzeels <t...@ripe.net> wrote:

> Hi all,
> 
> Following working group adoption I submitted the latest version of the delta 
> protocol document as a working group item:
> 
>> https://datatracker.ietf.org/doc/draft-ietf-sidr-delta-protocol/
> 
> Sriram, allow me to get back on previous comments you made during the call 
> for adoption:
> 
>> When authors spin a WG draft version (assuming it would be accepted as a WG 
>> draft),
>> it would be good if the following suggestions can be given consideration:
> 
> The current version is unchanged except for its name and number (00). But of 
> course we can give consideration to these and other suggestions for a next 
> version.
> 
>> 1. Include one short paragraph just to discuss key disadvantages of rsync and
>> how the delta protocol avoids or overcomes the same.
> 
> I would like to avoid general discussion that may include opinions about how 
> severe perceived disadvantages of rsync are. Instead I would like to focus on 
> a more positive, and factual message, of what we are trying to achieve with 
> this protocol, and why.
> 
> The last paragraph of the introduction has some text on this:
> 
>   This protocol is designed to be consistent with the publication
>   protocol [I-D.ietf-sidr-publication] and treats publication events of
>   one or more repository objects as immutable events that can be
>   communicated to relying parties.  This approach helps to minimize the
>   amount of data that traverses the network and thus helps minimize the
>   amount of time until repository convergence occurs.  This protocol
>   also provides a standards based way to obtain consistent, point in
>   time views of a single repository eliminating a number of consistency
>   related issues.  Finally, this approach allows for caching
>   infrastructure to be used to serve this immutable data, and thus
>   helps to reduce the load on a publication server when a large a
>   number of relying parties are querying it.
> 
> But admittedly this is incomplete and lacks explanation for why we believe 
> these are good things to have.
> 
> I am happy to elaborate more on this here, and if it's useful to include in 
> the document itself we can add more text..
> 
> Design goals and benefits that I see (did not check everything with my 
> co-authors, so will not speak for them..):
> 
> 
> = Based on publication protocol
> 
> Not the most important design goal, but useful because this way we do not 
> need to reinvent all of the data structure. We can re-use the <publish> and 
> <withdraw> elements that have already been defined. Furthermore this may be 
> easier for publication servers - they can re-use update messages from 
> Certification Authorities in update messages with minimal effort.
> 
> = Minimise data transfer
> 
> We don't want to waste bits.. this is nothing against rsync, rsync is 
> actually very good at this. We just thought it would be good if we didn't 
> waste bits here either.
> 
> = Point in time views
> 
> This actually refers to a problem with rsync. Objects may be republished 
> mid-transfer and this can lead to things like getting a new CRL, but an old 
> manifest - and then the hash of the CRL doesn't match and the MFT EE may be 
> revoked. Clients can keep trying until they get something that seems 
> consistent, but.. if a Certification Server just sends all its updates (CRL, 
> MFT, ROAs etc) in one message to a publication server, then all of this can 
> be served as one delta and a lot of this goes away.
> 
> = Caching infrastructure / CDNs (and immutable data)
> 
> There are many different http caching servers that can be used to deal with 
> the load of serving static data to a large number of clients. There are also 
> many commercial Global Content Delivery Networks (CDNs) that can be used to 
> improve this further - that can operate for some time even if the back-end 
> system is unavailable, can spread the load further, and reduce latency to 
> globally distributed clients.
> 
> Of course it is technically possible to build a CDN infrastructure with 
> rsync, using anycast or DNS tricks etc etc, but this is far from a trivial 
> effort. It's expensive to do, and easy to mess up, where the http based CDNs 
> and caching servers have had years of battle testing in the internet industry.
> 
> = Shift load to clients to support scaling
> 
> Not mentioned yet in the introduction.
> 
> There is an asymmetry between the number of clients (relying parties) and 
> servers. With rsync the server needs to invest effort (CPU and memory) in a 
> dialogue with the client in order to work out what the actually delta is that 
> the client needs. The proportion of this effort spent by the server in this 
> dialogue is a limiting factor on how many clients can be served.
> 
> Of course we can add more servers to counter this, but there is a lot more 
> gain to be had if we can minimise the effort that the server actually has to 
> invest.
> 
> For this reason the protocol shifts almost all the (CPU) work to the relying 
> party. The server creates a notification file that refers to static snapshot 
> and possible delta files. The RP can work out what they need to get all on 
> their own.
> 
> Serving this static data may still have bottlenecks with regards to network 
> usage and server memory, but this is where the http caching servers and CDNs 
> come in very helpful.
> 
> 
> = Only support what's needed / protocol stability
> 
> Also not mentioned yet, but I remember hallway discussions where people 
> mentioned things like: versioning… why don't you just use git/svn?
> 
> This seems overkill to me because these tools (like rsync) provide many 
> options that we don't need here. Also, which version of git/svn, or rsync, 
> are we talking about?
> 
> I think it would be best to have the protocol stripped down to only the bare 
> essentials, and version it clearly.
> 
> 
> = Availability of open-source libraries for transport http
> 
> There are numerous open source http libraries available to use for RP 
> software in every major programming language. Helps with performance (no need 
> to fork a process), testing, and it reduces the risk of the system as a whole 
> of pretty much relying on a single implementation.
> 
> 
> = Transport Protocol agnostic?
> 
> Although the protocol clearly relies on http caching, it was a design 
> decision to repeat relevant data such as session ids, and versions, in all 
> files (rather than making this implicit in their location). The reason is 
> that this will make it easier to share these files using other transport or 
> sharing protocols.
> 
> 
>> 2. Possible to say something about the relevance of or comparison with
>> Aspera (since they make big performance improvement claims over rsync etc.)? 
>> http://asperasoft.com/resources/benchmarks/
>> http://asperasoft.com/performance-calculator/
> 
> This seems to be an option as a transport protocol, but not as a complete 
> delta protocol - i.e. helping the client figure out what to get.
> 
> This may be quicker than http, but I am not convinced mainly because it's a 
> proprietary service. We would depend on a single vendor to support the 
> transport protocol.
> 
> 
> 
> 
> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to