seth vidal wrote:
On Mon, 2007-12-10 at 14:12 -0500, James Antill wrote:
A couple of questions about sqlite functions and usage:
1. Does anyone know why returnPackages() re-runs the _excluded() method
on cached content? AFAICS we just redo excluding for all pkgs for
nothing.
I think it is b/c we can update the exclude list whenever and from
wherever. It's not just a one time event, necessarily.
Its not completely unlikely that the skip-broken changes will need to
excluded the discarded packages from the repo to make sure they are not
added again. So this is not the right time to think about changing that.
> James Antill wrote:
> Hmm, ok. It's worth noting though that the current simplePkgList() only
> runs it on the first generation of the data.
That's not exactly true. The list is thrown away when the excludes are
changed in .delPackage(). But yes, it could be a bit more bullet proof.
2. AFAICS simplePkgList would be better off written like:
def simplePkgList(self):
"""returns a list of pkg tuples (n, a, e, v, r) from the sack"""
simplelist = []
for pkg in self.returnPackages():
simplelist.append((pkg.name, pkg.arch, pkg.epoch, pkg.version,
pkg.release))
return simplelist
...this saves about half a second (20-25%) for pretty much all the
commands, as the above loop is instant in comparison to the executeSQL()
version[1].
Sounds reasonable to me. You can try if a list comprehension is even faster:
return [pkg.pkgtup for pkg in self.returnPackages()]
3. Why aren't we using generators more, I'm not really sure how to
measure the memory requirements for the above simplePkgList() but given
pretty much all usage is "for pkgtup in simplePkgList():" it seems
mostly overhead. This isn't the only case either ... is it a back
compat. issue?
This is to some extend a compat issue. We've not found the time and strength
to go through the API and smash it into a better shape and I don't know if
we really should do that right now.
I personally doubt that we win much by introducing iterator in the
SqliteSack - especially not memory. The place it'd put iterator are the
Depsolve.check* methods. This would allow restarting the checks every time
we've modified the transaction. That way we only have to solve problems that
still exist.
When it comes to memory saving the point is that single lists or indexes
containing all packages are quite small compared to the overall memory foot
print. The only way to significantly reduce the memory usage is to avoid
loading package data at all (file lists if they are still loaded) or at
least not all at once.
In pyrpm we've been able to reduce the memory usage a lot by dynamically
loading the PRCOs and keeping them in a Least Recently Used cache (LRU) to
only have the PROCs of a fixed number of packages in memory at once.
But the main point here is how much code complexity, run time and memory
usage we want to deal for each other.
Florian
Btw. I still have a patch here that saves 15 MB of RpmSack memory on my
1000pkg F8 to devel update case (approx. 10%).
_______________________________________________
Yum-devel mailing list
[email protected]
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel