On Wed, 4 Jul 2007, seth vidal wrote:
On Wed, 2007-07-04 at 09:31 +0300, Panu Matilainen wrote:
On Tue, 3 Jul 2007, seth vidal wrote:
Whether it caused rpmdb corruption or not I dunno, it's entirely possible
it triggered races in the locking. I don't trust the rpm locking that
much. But more to the point, re-opening the db for each and every rpmdb
access to avoid holding the db open isn't what Jeff means about "being
careful" in 1) :) More like: open it when you have to, do your business
and close it. The less you do that, the better.
Since the new depsolver, the situation would look roughly like this:
1) open+close db for checking distroverpkg
2) download metadata if necessary
3) open db, depsolve
4) if filelists needed in 3), close db, download and reopen, continue 3)
5) close the db
5) download packages
6) do the final transaction
We need to have it open during the second 5, too - for sigchecking pkgs.
This is the section where we often get complaints b/c it is difficult to
abort the process b/c of the ctrl-c being grabbed and b/c of all the
mirrors it skips to.
So either we open,check, close, for every package or we open and leave
it open for the entire downloadPkgs process. I'd worry that doing it for
each and every package would be too much for rpm's locking and will get
us back to where we were a few months ago.
Another option would be to only do sha1 integrity checking on download,
wait until everything is downloaded and THEN do gpg checking all in one
shot It would make the interface a little less attractive but not
devastatingly so.
Ah, I didn't realize it was gpg-checking right after downloading. Maybe I
was thinking about an older version or something else :) Reopening for
each single package seems a bit wasteful, but then.. apt's Lua gpg-checker
actually forks out a new rpm process for each package (after the download
has completed), obviously causing lots of reopens as it goes and that
never caused any problems AFAICT.
So, I don't think reopening after each download would be deadly, just
perhaps a bit wasteful. Checking basic checksum at download completion and
then full gpg check before the final transaction sounds like a very good
plan to me.
The reopens in 4) are at max the number of enabled repositories, whereas
earlier the similar situation if was the number of packages in the
transaction and then some. A *big* difference there. I think the previous
time yum cached rpmdb header id's over those reopens which is not really
safe, if such tricks aren't done now it should be just fine to do the
above.
If we don't keep track via header ids then all the lookups take forever,
unfortunately. What I was thinking is could we take a timestamp or
checksum of the current 'version' of the rpmdb. If that's not changed
then we can use our header ids we have cached. If it has changed then we
invalidate the header ids and get them again. My two questions are:
1. does that seem safe?
Safe enough, I suppose. You still don't want to do it like it was
previously, eg each and every rpmdb lookup reopens it. So you'd want to
run most of the depsolve with db open (only close for filelist downloads)
and thus ctrl-c blocked, but that's far less annoying than the
uninterruptable download - the ctrl-c will be noticed sooner or later
anyway.
2. is there a db-version or journal or some other information in the
rpmdb we can use to know if it has been changed?
One possibility would be just looking at file modification times but I've
a feeling there was some nicer way hinted by Jeff a long time ago in
another discussion... hmm... yup:
"I can suggest several means to tell whether an rpmdb has changed
that don't rely on file mtimes. Retrieve the monotonically increasing
package instance from Packages with key 0x00000000 for example.
That value changes with any addition to an rpmdb, perhaps not gud
enuf if you want erasure changes too."
Then there's the extreme approach: open the rpmdb just once initially and
import the data you need into a sqlite db just like any other repodata and
then close it. With the new depsolver, you only need to open it again for
the actual transaction.
That seems like an extremely expensive option, doesn't it? The import
process will take a while, not to mention the file-lookup cost.
I would assume it to be far less expensive option (especially if done
incrementally) than parsing those XML files like we used to, but sure,
it's an extreme approach.
If it can be done in a sane way, yes. I'm not that familiar with the
Python C API (yet :) but I would assume it's possible to plant a
sys.excepthook from C when needed (rpmdb iterators open, basically) and
clean up things from there and then chain back to original excepthook.
We'll see...
thanks for looking at this.
Avoiding getting more of those "rpmdb is stuck and blows up" bugzilla
tickets is a pretty strong motivation now :)
Didn't experiment with putting it into rpm-python for real yet, but
getting the traceback trapped at C-level was easy enough. The rest of the
trail leads to ... dumdumdum ... signal handling, and the code path there
that automatically does all the necessary cleanup ends in an exit() deep
from rpmlib. In practise it means a custom sys.excepthook couldn't be
called while rpmdb/iterators are active. Not a problem for yum where the
output is text-based anyway (the traceback can be printed), but for things
like pirut which probably have their own handler to give pretty tracebacks
to GUI, it is.
Similarly, exposing the rpmdbCheckSignals() call to the python bindings is
trivial (and rpm5.org actually has it already), but has the very problem
as described above: any call to rpmdbCheckSignals() will either return
normally or exit() with no return to calling code.
It could be enhanced in a API-compatible way with a wrapper that works
like the old one I suppose... need to scratch head some more to figure how
to best handle it.
- Panu -
_______________________________________________
Yum-devel mailing list
[email protected]
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel