[Yum-devel] [PATCH 1/3] Use delta metadata when available

Zdenek Pavlas Tue, 25 Jun 2013 04:39:56 -0700

When Yum needs a newer version of <mdtype> and there's a <mdtype>.delta<N>
available in new repomd with a timestamp that matches the old <mdtype> version,
we download it and apply.


The diff/patch algorithm is targeted at XML metadata files.  We split
at each "<package " substring, and also at the last closing tag.
A repository with N packages always yields exactly N+2 chunks.

The delta format is a simple line-oriented sequence of <literal> or <chunkref>
tokens.  Sequential references are further compressed to just a single newline.
Delta file is finally compressed with a general-purpose compressor.

- The delta files are much smaller than these produced with 'diff -e'.

- It handles package reordering very well.  Fedora still uses old
  createrepo that shuffles packages a lot when ran with --update.

- Since the chunks we handle are quite big, it's fast.

- It's easy to merge chained diffs, even if the original is not available.

The cons are:

- We need to (usually) load the whole old file to memory, although an attempt
  is being made to make the copy streaming if possible.

- Sub-package changes are not supported.  A simple pkg version + checksum
  bump is as costly as adding a new package.

To make use of it:

1) The metadata must include the deltamd information.  The deltamd script
   in createrepo facilitates this, including automatic merging of previous
   deltas and their limiting.

2) Yum must use the XML metadata and build sqlite databases locally.
   createrepo must use --no-database, or mddownloadpolicy=xml option
   has to be set in yum.conf or *.repo file.
---
 yum/deltamd.py | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 yum/misc.py    |  3 ++-
 yum/yumRepo.py | 22 ++++++++++++++++++
 3 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 yum/deltamd.py

diff --git a/yum/deltamd.py b/yum/deltamd.py
new file mode 100644
index 0000000..97b2915
--- /dev/null
+++ b/yum/deltamd.py
@@ -0,0 +1,70 @@
+#  Delta metadata support
+#  Copyright 2013 Zdenek Pavlas
+
+#   This library is free software; you can redistribute it and/or
+#   modify it under the terms of the GNU Lesser General Public
+#   License as published by the Free Software Foundation; either
+#   version 2.1 of the License, or (at your option) any later version.
+#
+#   This library is distributed in the hope that it will be useful,
+#   but WITHOUT ANY WARRANTY; without even the implied warranty of
+#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#   Lesser General Public License for more details.
+#
+#   You should have received a copy of the GNU Lesser General Public
+#   License along with this library; if not, write to the
+#      Free Software Foundation, Inc.,
+#      59 Temple Place, Suite 330,
+#      Boston, MA  02111-1307  USA
+
+from yum.misc import _decompress_chunked
+
+def compressOpen(filename):
+    ztype = filename
+    if ztype[-8:] == '.old.tmp':
+        ztype = ztype[:-8]
+    ztype = ztype.rsplit('.', 1)[1]
+    return _decompress_chunked(filename, None, ztype)
+
+def splitter(filename, boundary='<package '):
+    read = compressOpen(filename).read
+    buf = ''
+    while True:
+        more = read(0x4000)
+        if not more: break
+        buf += more
+        i = 0
+        while True:
+            j = buf.find(boundary, i + len(boundary))
+            if j == -1: break
+            yield buf[i:j]
+            i = j
+        buf = buf[i:]
+    i = buf.rfind('</')
+    if i != -1:
+        yield buf[:i]
+        buf = buf[i:]
+    yield buf
+
+def apply_delta(old, delta, new):
+    # open old and new
+    next = splitter(old).next
+    write = open(new, 'wb').write
+    lookup = []
+    # process delta
+    n = 0
+    f = compressOpen(delta)
+    while True:
+        l = f.readline()
+        if l == '': break
+        if l[0] == '+':
+            # literal chunk
+            write(f.read(int(l[1:])))
+            continue
+        # old data ref
+        if l[0] != '\n':
+            n = int(l)
+        while len(lookup) <= n:
+            lookup.append(next())
+        write(lookup[n])
+        n += 1
diff --git a/yum/misc.py b/yum/misc.py
index ca00b3c..f4de9eb 100644
--- a/yum/misc.py
+++ b/yum/misc.py
@@ -778,7 +778,8 @@ def _decompress_chunked(source, dest, ztype):
     elif ztype == 'gz':
         s_fn = gzip.GzipFile(source, 'r')
     
-    
+    if dest is None:
+        return s_fn
     destination = open(dest, 'w')
 
     while True:
diff --git a/yum/yumRepo.py b/yum/yumRepo.py
index 242ed66..38f7fca 100644
--- a/yum/yumRepo.py
+++ b/yum/yumRepo.py
@@ -51,6 +51,7 @@ import shutil
 import stat
 import errno
 import tempfile
+from yum.deltamd import apply_delta
 
 # This is unused now, probably nothing uses it but it was global/public.
 skip_old_DBMD_check = False
@@ -1589,6 +1590,20 @@ Insufficient space in download directory %s
                     os.rename(local, local + '.old.tmp')
                     reverts.append(local)
 
+                    # The old version is valid. If the mdtype hasn't changed
+                    # we may try to use delta file instead.
+                    if omdtype == nmdtype:
+                        c = 0
+                        while True:
+                            delta = '%s.delta%d' % (omdtype, c); c += 1
+                            if delta not in all_mdtypes: break
+                            data = self.repoXML.getData(delta)
+                            if data.timestamp == odata.timestamp:
+                                data.original = local + '.old.tmp', omdtype
+                                nmdtype = delta
+                                ndata = data
+                                break
+
                     #  This is the super easy way. We just to see if a 
generated
                     # file is there for all files, but it should always work.
                     #  And anyone who is giving us MD with blah and blah.sqlite
@@ -1617,6 +1632,13 @@ Insufficient space in download directory %s
         for (ndata, nmdtype) in downloading:
             local = self._get_mdtype_fname(ndata, False)
             self._oldRepoMDData['new_MD_files'].append(local)
+
+            # Apply delta files
+            if hasattr(ndata, 'original'):
+                old, mdtype = ndata.original
+                apply_delta(old, local, self.cachedir +'/gen/%s.xml' % mdtype)
+                os.unlink(local)
+
         self._doneOldRepoXML()
 
     def _groupLoadRepoXML(self, text=None, mdtypes=None):
-- 
1.7.11.7

_______________________________________________
Yum-devel mailing list
[email protected]
http://lists.baseurl.org/mailman/listinfo/yum-devel

[Yum-devel] [PATCH 1/3] Use delta metadata when available

Reply via email to