On 06/20/2013 11:03 AM, Zdenek Pavlas wrote:
The format of /var/lib/rpm/Packages is private to rpm and yum has no
business poking into it directly. That aside, the exact underlying
format of BDB databases is private to BDB and can and does change every
now and then so it depends on the version rpm was linked against.
Didn't want to hash the whole header, but hashing 512 bytes or so
is probably not that much slower than reading specific 4 bytes,
so guessing the offset is not necessary.
Also h_nelem does not represent the number of entries in the database,
its the *estimated* size of the *hash table*:
http://docs.oracle.com/cd/E17076_03/html/api_reference/C/dbget_h_nelem.html
Interesting.. in reality it's equal to number of records
(#pkgs + #gpg-pubkey + 1), and changes after each single
package install/erase.
In current BDB it may happen to be equal to the number of db "records",
but there's no guarantee that'll be the case, and if rpm chose to fiddle
with the hash table size for optimization...
This is probably a huge overkill for what you need, but just as
an example:
ts = rpm.ts()
h = hashlib.sha1()
ii = ts.dbIndex('sha1header')
for s in ii:
h.update(s)
for (dboffset, dbfileno) in ii.instances():
h.update('%s' % dboffset)
print h.hexdigest() # profit!
This is probably killing the whole point of the cache.
(amount of data needed to validate the cache is not much
smaller than what we actually cache)
The point of yum's cache is that without it, you're forced to read
through the entire Packages db, which often is close to or even over
100MB in size. THAT is what kills performance.
In comparison, the indexes are in tens to a few hundred kilobytes in
size. While reading some 200KB obviously will take longer than an
integer or two for determining cache validity, that'd actually enable
you to perform partial cache updates on the items that changed since
last time, instead of forcing a full plow through the rpmdb.
TRT is to make the rpmdb API fast enough so any caching
is unnecessary.
The API (and its speed/lack of thereoof) is largely dictated by the ages
old format of the rpmdb: as long as rpm needs to load full headers to
access any minor detail from packages, it's going to be relatively slow.
Which is why the index iterator exists these days, but for yum's
purposes a NEVRA index would be needed.
- Panu -
_______________________________________________
Yum-devel mailing list
[email protected]
http://lists.baseurl.org/mailman/listinfo/yum-devel