seth vidal wrote:
On Mon, 2007-12-10 at 14:12 -0500, James Antill wrote:
A couple of questions about sqlite functions and usage:

1. Does anyone know why returnPackages() re-runs the _excluded() method
on cached content? AFAICS we just redo excluding for all pkgs for
nothing.


I think it is b/c we can update the exclude list whenever and from
wherever. It's not just a one time event, necessarily.

Its not completely unlikely that the skip-broken changes will need to excluded the discarded packages from the repo to make sure they are not added again. So this is not the right time to think about changing that.

> James Antill wrote:
> Hmm, ok. It's worth noting though that the current simplePkgList() only
> runs it on the first generation of the data.

That's not exactly true. The list is thrown away when the excludes are changed in .delPackage(). But yes, it could be a bit more bullet proof.

2. AFAICS simplePkgList would be better off written like:

    def simplePkgList(self):
        """returns a list of pkg tuples (n, a, e, v, r) from the sack"""

        simplelist = []
        for pkg in self.returnPackages():
                simplelist.append((pkg.name, pkg.arch, pkg.epoch, pkg.version, 
pkg.release))
        return simplelist

...this saves about half a second (20-25%) for pretty much all the
commands, as the above loop is instant in comparison to the executeSQL()
version[1].

Sounds reasonable to me. You can try if a list comprehension is even faster:
return [pkg.pkgtup for pkg in self.returnPackages()]


3. Why aren't we using generators more, I'm not really sure how to
measure the memory requirements for the above simplePkgList() but given
pretty much all usage is "for pkgtup in simplePkgList():" it seems
mostly overhead. This isn't the only case either ... is it a back
compat. issue?

This is to some extend a compat issue. We've not found the time and strength to go through the API and smash it into a better shape and I don't know if we really should do that right now.

I personally doubt that we win much by introducing iterator in the SqliteSack - especially not memory. The place it'd put iterator are the Depsolve.check* methods. This would allow restarting the checks every time we've modified the transaction. That way we only have to solve problems that still exist.

When it comes to memory saving the point is that single lists or indexes containing all packages are quite small compared to the overall memory foot print. The only way to significantly reduce the memory usage is to avoid loading package data at all (file lists if they are still loaded) or at least not all at once.

In pyrpm we've been able to reduce the memory usage a lot by dynamically loading the PRCOs and keeping them in a Least Recently Used cache (LRU) to only have the PROCs of a fixed number of packages in memory at once.

But the main point here is how much code complexity, run time and memory usage we want to deal for each other.

Florian

Btw. I still have a patch here that saves 15 MB of RpmSack memory on my 1000pkg F8 to devel update case (approx. 10%).
_______________________________________________
Yum-devel mailing list
[email protected]
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Reply via email to