Dne Ne 3. března 2013 17:46:10, Panu Matilainen napsal(a): > On 03/01/2013 06:32 PM, Michael Schroeder wrote: > > Hi Panu et al, > > > > here are some numbers/musings about changing the database > > implementation to just one single packages file: > > > > - I assume that we still want to store all the headers (in some > > > > format) anyway. > > Nod, I think the headers need to stay, the exact format is another, open > question. > > > - I checked all the headers of the i586/noarch packages from FC18 > > > > to get some understanding how big they are and if it makes > > > > sense to compress them. Here's the result: > > scanned: 28423 rpms > > uncompressed: sum: 777290960, avg: 27348, median: 10600 > > lzo: sum: 305711769, avg: 10756, median: 4805 > > gzip: sum: 255995670, avg: 9007, median: 4154 > > xz: sum: 215564872, avg: 7585, median: 3728 > > > > (the median is quite different from the avg, that means that > > some packages are quite big.) > > > > As you can see, compression about halfs the size of the headers. > > LZO seems to be "good enough" and has the advantage that it's > > really fast. > > > > - That means, if I have 2000 packages installed on my system > > > > (which is about the real number), the concatenated headers will > > use 20 MByte (using the median), 10 MByte when using LZO > > compression, 7.5 with xz. > > > > - So if we want to drop all index files and just scan the > > > > packages database, we would need (assuming disk IO throughput > > of 50 M/s) about .2 seconds to create the in-memory index > > data. Which maybe is too much, I dunno. > > Right, in this context compression does indeed seem quite attractive. > When we talked about this in the devconf, I was thinking about the way > rpm itself currently keeps (re)loading the headers from Packages and > adding repeated decompression to the other costs of header loading > didn't seem like a way to make it faster. But for roughly halving the > amount of io needed for scanning through it exactly once (which is of > course the way libsolv operates) its quite a different thing.
Which begs the question - can we make RPM behave this way as well? ;-) > 0.2s is not a whole lot, for many operations absolutely nothing really, > but I'd think some kind of cache would be in order to avoid having to > read through all of packages just for those simple 'rpm -qf /foo' kind > of queries. Such as, store the in-memory index structures into a memory > mapped cache file. The cache could perhaps be write-once and read-only > for other uses so there's no need for locking within the cache: eg > recreate it from scratch at the end of transactions and atomically > replace the old one so the cache itself is always coherent. Or > something... this isn't that far from libsolv's .solv files. > > Speaking of which... a funny little idea I got at the end of the > devconf: regardless of future rpmdb format changes, it should be now > possible to write an rpm plugin that creates + updates a .solv file for > the rpmdb, so you should never have to actually read through the entire > rpmdb in libsolv and its users like libzypp, dnf etc. This sounds really cool. Thanks Jan _______________________________________________ Rpm-maint mailing list Rpm-maint@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-maint