Re: [Yum-devel] [RFC] Questions about sqlitesack and list vs. generators

Florian Festi Tue, 11 Dec 2007 02:14:19 -0800

seth vidal wrote:

On Mon, 2007-12-10 at 14:12 -0500, James Antill wrote:

A couple of questions about sqlite functions and usage:


1. Does anyone know why returnPackages() re-runs the _excluded() method
on cached content? AFAICS we just redo excluding for all pkgs for
nothing.


I think it is b/c we can update the exclude list whenever and from
wherever. It's not just a one time event, necessarily.

Its not completely unlikely that the skip-broken changes will need toexcluded the discarded packages from the repo to make sure they are notadded again. So this is not the right time to think about changing that.


> James Antill wrote:
> Hmm, ok. It's worth noting though that the current simplePkgList() only
> runs it on the first generation of the data.

That's not exactly true. The list is thrown away when the excludes arechanged in .delPackage(). But yes, it could be a bit more bullet proof.

2. AFAICS simplePkgList would be better off written like:

    def simplePkgList(self):
        """returns a list of pkg tuples (n, a, e, v, r) from the sack"""

        simplelist = []
        for pkg in self.returnPackages():
                simplelist.append((pkg.name, pkg.arch, pkg.epoch, pkg.version, 
pkg.release))
        return simplelist

...this saves about half a second (20-25%) for pretty much all the
commands, as the above loop is instant in comparison to the executeSQL()
version[1].


Sounds reasonable to me. You can try if a list comprehension is even faster:
return [pkg.pkgtup for pkg in self.returnPackages()]

3. Why aren't we using generators more, I'm not really sure how to
measure the memory requirements for the above simplePkgList() but given
pretty much all usage is "for pkgtup in simplePkgList():" it seems
mostly overhead. This isn't the only case either ... is it a back
compat. issue?

This is to some extend a compat issue. We've not found the time and strengthto go through the API and smash it into a better shape and I don't know ifwe really should do that right now.

I personally doubt that we win much by introducing iterator in theSqliteSack - especially not memory. The place it'd put iterator are theDepsolve.check* methods. This would allow restarting the checks every timewe've modified the transaction. That way we only have to solve problems thatstill exist.

When it comes to memory saving the point is that single lists or indexescontaining all packages are quite small compared to the overall memory footprint. The only way to significantly reduce the memory usage is to avoidloading package data at all (file lists if they are still loaded) or atleast not all at once.

In pyrpm we've been able to reduce the memory usage a lot by dynamicallyloading the PRCOs and keeping them in a Least Recently Used cache (LRU) toonly have the PROCs of a fixed number of packages in memory at once.

But the main point here is how much code complexity, run time and memoryusage we want to deal for each other.


Florian

Btw. I still have a patch here that saves 15 MB of RpmSack memory on my1000pkg F8 to devel update case (approx. 10%).

_______________________________________________
Yum-devel mailing list
[email protected]
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Re: [Yum-devel] [RFC] Questions about sqlitesack and list vs. generators

Reply via email to