On Tue, 2013-06-25 at 13:38 +0200, Zdenek Pavlas wrote: > When Yum needs a newer version of <mdtype> and there's a <mdtype>.delta<N> > available in new repomd with a timestamp that matches the old <mdtype> > version, > we download it and apply.
This requires a client check in very often, right? We basically have deltas from specific createrepo runs, and we can only have N of them? The big problem here is that if we have one package added per. createrepo run, it's the same metric as if we have 100 packages added. I don't see how that will work well for rawhide, and I'm far from sure it will work well for Fedora updates/updates-testing. There are also likely to be future types of repos. that are getting built much more often than rawhide (think coprs automatically building from git commits etc.). > The diff/patch algorithm is targeted at XML metadata files. We split > at each "<package " substring, and also at the last closing tag. > A repository with N packages always yields exactly N+2 chunks. > > The delta format is a simple line-oriented sequence of <literal> or <chunkref> > tokens. Sequential references are further compressed to just a single > newline. > Delta file is finally compressed with a general-purpose compressor. Ok, seems sane from a general description. > - The delta files are much smaller than these produced with 'diff -e'. Even when you compress the 'diff -e' result? How? > - It handles package reordering very well. Fedora still uses old > createrepo that shuffles packages a lot when ran with --update. I thought F18 used the F18 yum+createrepo etc. ... so all the ordering was fine now, it was only the rpmbuild side that was using el5/el6 aged code. > - Since the chunks we handle are quite big, it's fast. > > - It's easy to merge chained diffs, even if the original is not available. Not sure what you mean here. > The cons are: > > - We need to (usually) load the whole old file to memory, although an attempt > is being made to make the copy streaming if possible. > > - Sub-package changes are not supported. A simple pkg version + checksum > bump is as costly as adding a new package. I doubt this one is that problematic. Worst case I assume is resigning, and even then I figure people can live with having to redownload everything on a resign. > To make use of it: > > 1) The metadata must include the deltamd information. The deltamd script > in createrepo facilitates this, including automatic merging of previous > deltas and their limiting. Sure, I assume the big downside here is that repomd gets bigger ... do you have data on that? > 2) Yum must use the XML metadata and build sqlite databases locally. > createrepo must use --no-database, or mddownloadpolicy=xml option > has to be set in yum.conf or *.repo file. Sure, that's pretty much what the option was added for anyway and we can also change yum so that turning on deltamd implies mddownloadpolicy=xml when deltas are available. However the giant downside I see (I think) is that you aren't generating valid MD as a result. So given: we have: old-MD.xml repo has: new-MD.xml delta-from-old2new-MD.bz2 ...we don't end up with "new-MD.xml" we end up "local-MD.xml" which is assumed to be the same as "new-MD.xml" ... as I said before doing that can't end well, even in the best case every time any bug report comes in that we can't immediately reproduce we'll have to wonder "do they actually have the same MD". It almost guarantees weird problems that only happen after a client has downloaded/applied 666 deltas. AIUI it's also the reason for the following two patches, to try and work around the fact the rest of yum doesn't like the fact we don't actually have anything downloaded from the repo. anymore. Why can't we check what we generated against the data that is offered for full download? This will eliminate all these problems. _______________________________________________ Yum-devel mailing list [email protected] http://lists.baseurl.org/mailman/listinfo/yum-devel
