Tim,

Thanks for your detailed responses to my comments/questions.
To the extent possible (and if your coauthors agree), please include some of 
these 
explanations in the document (intro section or wherever appropriate).
You may, of course, post the more detailed discussion (as you've offered below) 
on a website - that you might maintain going forward - devoted to the delta 
protocol 
where people can access FAQ, code download, etc.

Sriram    

-----Original Message-----
From: sidr [mailto:sidr-boun...@ietf.org] On Behalf Of Tim Bruijnzeels
Sent: Tuesday, February 17, 2015 12:54 PM
To: sidr wg list
Subject: [sidr] draft-ietf-sidr-delta-protocol-00.txt

Hi all,

Following working group adoption I submitted the latest version of the delta 
protocol document as a working group item:

> https://datatracker.ietf.org/doc/draft-ietf-sidr-delta-protocol/

Sriram, allow me to get back on previous comments you made during the call for 
adoption:

> When authors spin a WG draft version (assuming it would be accepted as 
> a WG draft), it would be good if the following suggestions can be given 
> consideration:

The current version is unchanged except for its name and number (00). But of 
course we can give consideration to these and other suggestions for a next 
version.

> 1. Include one short paragraph just to discuss key disadvantages of 
> rsync and how the delta protocol avoids or overcomes the same.

I would like to avoid general discussion that may include opinions about how 
severe perceived disadvantages of rsync are. Instead I would like to focus on a 
more positive, and factual message, of what we are trying to achieve with this 
protocol, and why.

The last paragraph of the introduction has some text on this:

   This protocol is designed to be consistent with the publication
   protocol [I-D.ietf-sidr-publication] and treats publication events of
   one or more repository objects as immutable events that can be
   communicated to relying parties.  This approach helps to minimize the
   amount of data that traverses the network and thus helps minimize the
   amount of time until repository convergence occurs.  This protocol
   also provides a standards based way to obtain consistent, point in
   time views of a single repository eliminating a number of consistency
   related issues.  Finally, this approach allows for caching
   infrastructure to be used to serve this immutable data, and thus
   helps to reduce the load on a publication server when a large a
   number of relying parties are querying it.

But admittedly this is incomplete and lacks explanation for why we believe 
these are good things to have.

I am happy to elaborate more on this here, and if it's useful to include in the 
document itself we can add more text..

Design goals and benefits that I see (did not check everything with my 
co-authors, so will not speak for them..):


= Based on publication protocol

Not the most important design goal, but useful because this way we do not need 
to reinvent all of the data structure. We can re-use the <publish> and 
<withdraw> elements that have already been defined. Furthermore this may be 
easier for publication servers - they can re-use update messages from 
Certification Authorities in update messages with minimal effort.

= Minimise data transfer

We don't want to waste bits.. this is nothing against rsync, rsync is actually 
very good at this. We just thought it would be good if we didn't waste bits 
here either.

= Point in time views

This actually refers to a problem with rsync. Objects may be republished 
mid-transfer and this can lead to things like getting a new CRL, but an old 
manifest - and then the hash of the CRL doesn't match and the MFT EE may be 
revoked. Clients can keep trying until they get something that seems 
consistent, but.. if a Certification Server just sends all its updates (CRL, 
MFT, ROAs etc) in one message to a publication server, then all of this can be 
served as one delta and a lot of this goes away.

= Caching infrastructure / CDNs (and immutable data)

There are many different http caching servers that can be used to deal with the 
load of serving static data to a large number of clients. There are also many 
commercial Global Content Delivery Networks (CDNs) that can be used to improve 
this further - that can operate for some time even if the back-end system is 
unavailable, can spread the load further, and reduce latency to globally 
distributed clients.

Of course it is technically possible to build a CDN infrastructure with rsync, 
using anycast or DNS tricks etc etc, but this is far from a trivial effort. 
It's expensive to do, and easy to mess up, where the http based CDNs and 
caching servers have had years of battle testing in the internet industry.

= Shift load to clients to support scaling

Not mentioned yet in the introduction.

There is an asymmetry between the number of clients (relying parties) and 
servers. With rsync the server needs to invest effort (CPU and memory) in a 
dialogue with the client in order to work out what the actually delta is that 
the client needs. The proportion of this effort spent by the server in this 
dialogue is a limiting factor on how many clients can be served.

Of course we can add more servers to counter this, but there is a lot more gain 
to be had if we can minimise the effort that the server actually has to invest.

For this reason the protocol shifts almost all the (CPU) work to the relying 
party. The server creates a notification file that refers to static snapshot 
and possible delta files. The RP can work out what they need to get all on 
their own.

Serving this static data may still have bottlenecks with regards to network 
usage and server memory, but this is where the http caching servers and CDNs 
come in very helpful.


= Only support what's needed / protocol stability

Also not mentioned yet, but I remember hallway discussions where people 
mentioned things like: versioning... why don't you just use git/svn?

This seems overkill to me because these tools (like rsync) provide many options 
that we don't need here. Also, which version of git/svn, or rsync, are we 
talking about?

I think it would be best to have the protocol stripped down to only the bare 
essentials, and version it clearly.


= Availability of open-source libraries for transport http

There are numerous open source http libraries available to use for RP software 
in every major programming language. Helps with performance (no need to fork a 
process), testing, and it reduces the risk of the system as a whole of pretty 
much relying on a single implementation.


= Transport Protocol agnostic?

Although the protocol clearly relies on http caching, it was a design decision 
to repeat relevant data such as session ids, and versions, in all files (rather 
than making this implicit in their location). The reason is that this will make 
it easier to share these files using other transport or sharing protocols.


> 2. Possible to say something about the relevance of or comparison with 
> Aspera (since they make big performance improvement claims over rsync etc.)?
> http://asperasoft.com/resources/benchmarks/
> http://asperasoft.com/performance-calculator/

This seems to be an option as a transport protocol, but not as a complete delta 
protocol - i.e. helping the client figure out what to get.

This may be quicker than http, but I am not convinced mainly because it's a 
proprietary service. We would depend on a single vendor to support the 
transport protocol.

_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to