On Wed, 2007-07-04 at 09:31 +0300, Panu Matilainen wrote: > On Tue, 3 Jul 2007, seth vidal wrote:
> Whether it caused rpmdb corruption or not I dunno, it's entirely possible > it triggered races in the locking. I don't trust the rpm locking that > much. But more to the point, re-opening the db for each and every rpmdb > access to avoid holding the db open isn't what Jeff means about "being > careful" in 1) :) More like: open it when you have to, do your business > and close it. The less you do that, the better. > > Since the new depsolver, the situation would look roughly like this: > 1) open+close db for checking distroverpkg > 2) download metadata if necessary > 3) open db, depsolve > 4) if filelists needed in 3), close db, download and reopen, continue 3) > 5) close the db > 5) download packages > 6) do the final transaction We need to have it open during the second 5, too - for sigchecking pkgs. This is the section where we often get complaints b/c it is difficult to abort the process b/c of the ctrl-c being grabbed and b/c of all the mirrors it skips to. So either we open,check, close, for every package or we open and leave it open for the entire downloadPkgs process. I'd worry that doing it for each and every package would be too much for rpm's locking and will get us back to where we were a few months ago. Another option would be to only do sha1 integrity checking on download, wait until everything is downloaded and THEN do gpg checking all in one shot It would make the interface a little less attractive but not devastatingly so. > The reopens in 4) are at max the number of enabled repositories, whereas > earlier the similar situation if was the number of packages in the > transaction and then some. A *big* difference there. I think the previous > time yum cached rpmdb header id's over those reopens which is not really > safe, if such tricks aren't done now it should be just fine to do the > above. If we don't keep track via header ids then all the lookups take forever, unfortunately. What I was thinking is could we take a timestamp or checksum of the current 'version' of the rpmdb. If that's not changed then we can use our header ids we have cached. If it has changed then we invalidate the header ids and get them again. My two questions are: 1. does that seem safe? 2. is there a db-version or journal or some other information in the rpmdb we can use to know if it has been changed? > Then there's the extreme approach: open the rpmdb just once initially and > import the data you need into a sqlite db just like any other repodata and > then close it. With the new depsolver, you only need to open it again for > the actual transaction. That seems like an extremely expensive option, doesn't it? The import process will take a while, not to mention the file-lookup cost. > If it can be done in a sane way, yes. I'm not that familiar with the > Python C API (yet :) but I would assume it's possible to plant a > sys.excepthook from C when needed (rpmdb iterators open, basically) and > clean up things from there and then chain back to original excepthook. > We'll see... thanks for looking at this. -sv _______________________________________________ Yum-devel mailing list [email protected] https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
