Ok. so Xmas present everyone :). This is a biggish change I've been working on for a bit. It basically changes _loadRepoXML() to download the repomd.xml and all of the MD files as an atomic group, and if anything goes "wrong" we revert to the old set of MD. I looked at another approach, of only trying to revert to old data if any of getPrimaryXML() etc. had a problem ... but that was much more complicated, and required we store at least one old copy of everything and doesn't solve some of the network issues this way does. I've been running it for a bit, and I've tried to test some of the failure cases to make sure it actually does what it says on the box. Anyway, please take a look and let me know what you think you have a bit as I'm not planning on checking it in this year (but feel free to test it too :).
The change is big, so you can view it here: http://people.redhat.com/jantill/yum/_groupLoadRepoXML.patch http://people.redhat.com/jantill/yum/yumRepo.py Pros. ----- Should stop these metadata update problems: 1. We get corrupted comps/etc. files on the master and everyone has problems. 2. We hit mirror(s) that have an updated repomd.xml but nothing else. 3. We don't have a network but the cache does a timeout and urlgrabber kills repomd.xml and we can't get a new one (makes yum stop working). Should stop "back in time updates". Ie. we hit an old mirror, and we basically go back in time for updates. Should stop yum cmd line usage hitting network. Basically yum-updatesd will now download all of filelists/updateinfo/etc. so we don't have a problem where user does "yum blah" which happens to need a file we don't have so we hit the network. This actually has follow-on problems where the file isn't the same anymore but we haven't updated repomd.xml yet (or the network is down, think yum deplist /usr/bin/foo). Cons. ----- Downloads more stuff at once. Basically the current yum model only ever downloads what we need, when we need it, now we'll download all the MD files whenever repomd.xml gets updated and they need it. However I've left in the functions to do the old behaviour, so we could have a configuration option or let yum-updatesd only have the new behaviour or something ... if people think this is a big concern. Currently I don't do anything special on C-c, or other weird exceptions ... so it's possible that a C-c at the wrong time will leave the repo in a partially updated state. But that's what it does all the time now, so I don't think this is a big issue. _If_ we don't have a full set of MD currently, and we fail to get a new batch of data then we'll revert back to the non-full set. And I guess it's then possible that we'll need one of the files in the set we don't have, but that would be available if we had used the traditional code path (Ie. the error was in one of the files we don't currently need). But after this code has run once successfully we'll always have a full set of MD, so the only thing that _might_ make this worth considering is if we allow "traditional" behaviour as an option. I put this here mainly for full disclosure :). Minor CPU speed hit, due to parsing two sets of XML. Things to look at / think about ------------------------------- Do you hate the idea/design in some way, is there an alternate approach you think would be better? Does the code in _groupLoadDataMD() look correct? I've tested it ... but I'd still like a second opinion. Atm. we check for "newness" in _groupCheckDataMDNewer() by looking at the timestamp information for all the MD files, and if the "new" repomd.xml has timestamps which is older we dump it. Can anyone think of any problems with doing this and/or should we just put something in repomd.xml itself? These funcs got args to specify "don't throw": YumRepository._checksum YumRepository._checkMD YumRepository._retrieveMD ...the later two are the public API functions, and I didn't want to add the new arguments as part of the public API so the public functions call the internal ones and the internal ones have the new args. My big worry here is that I haven't seen any other yum functions that do this, but the other option is to put try/catch blocks in a bunch of places, which seems like more code for no gain and makes it less obvious what is happening. In fact it even removed a few lines of code from other functions in that file that were calling the above within try blocks for the same reason. Do we want to add some kind of configuration or something for the old behaviour? My opinion is probably not, but then I'd have probably done it a bit like this to start with ... so what was the rationale? new helper functions: YumRepository._cachingRepoXML YumRepository._getFileRepoXML YumRepository._parseRepoXML YumRepository._saveOldRepoXML YumRepository._revertOldRepoXML YumRepository._doneOldRepoXML YumRepository._get_mdtype_data YumRepository._get_mdtype_fname YumRepository._groupCheckDataMDNewer YumRepository._groupLoadDataMD ...I assume most of these are fine, assuming the above is good. I've also added a YumRepository._oldRepoMDFile ... does anyone care? I've also moved: YumPackageSack._check_db_version ...into YumRepository, I assume that's not controversial? ... but atm. I've kept the call in YumPackageSack as a wrapper, just in case. -- James Antill <[EMAIL PROTECTED]> Red Hat
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Yum-devel mailing list [email protected] https://lists.dulug.duke.edu/mailman/listinfo/yum-devel
