Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-21 Thread Jeff Johnson
On Dec 21, 2010, at 2:26 AM, Anders F Björklund wrote: > Jeff Johnson wrote: > >>> Should make it into a generic library eventually, once this prototyping >>> is done... Amazing how many silly bitarrays and digests are out there, >>> like using scripted byte arrays and for instance MD5, for Bloo

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-20 Thread Anders F Björklund
Jeff Johnson wrote: >> Should make it into a generic library eventually, once this prototyping >> is done... Amazing how many silly bitarrays and digests are out there, >> like using scripted byte arrays and for instance MD5, for Bloom filters. >> It'll be interesting to see how the performance do

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-20 Thread Jeff Johnson
On Dec 20, 2010, at 7:01 PM, Anders F Björklund wrote: > Jeff Johnson wrote: > > Should make it into a generic library eventually, once this prototyping > is done... Amazing how many silly bitarrays and digests are out there, > like using scripted byte arrays and for instance MD5, for Bloom filt

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-20 Thread Anders F Björklund
Jeff Johnson wrote: > You are already seeing that a *uncompressed* Bloom filter using conservative > parameters like 10**-6 is comparable in size to the traditional *compressed* > file paths. With 10**-4, and a per-package population estimate of ~50K, > and compression on the array of Bloom filter

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Per Øyvind Karlsen
2010/12/18 Jeff Johnson : > > On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote: > >> >> On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: >> >>> >>> So I guess there's something I'm not really fully grasping here... >>> >>> See code attached... >>> >> >> Yes. You miss that you need to estimate

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Jeff Johnson
> > The "bisection" likely isn't worth worrying about until there is need. But > rpmbdUnion/rpmbfIntersect are useful operations on arrays of fixed size > Bloom filters no matter what. > One last hint I forgot (re using rpmbfIntersect) Assuming that all of the Bloom filters are fixed size, the

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-18 Thread Jeff Johnson
On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote: > > On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: > >> >> So I guess there's something I'm not really fully grasping here... >> >> See code attached... >> > > Yes. You miss that you need to estimate the expected size of the > populat

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-17 Thread Jeff Johnson
On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote: > > So I guess there's something I'm not really fully grasping here... > > See code attached... > Yes. You miss that you need to estimate the expected size of the population you wish to capture in a Bloom Filter: size_t n = 0;

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-17 Thread Per Øyvind Karlsen
2010/12/15 Jeff Johnson : > > On Dec 14, 2010, at 9:51 PM, Jeff Johnson wrote: > >> >> Download. uncompress. use for file dependencies. >> >> I will take wagers on how much smaller the encoding is as >> soon as you tell me what you choose for {n,p}. >> > > There's an obvious generalization here for

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-15 Thread Jeff Johnson
On Dec 14, 2010, at 9:51 PM, Jeff Johnson wrote: > > Download. uncompress. use for file dependencies. > > I will take wagers on how much smaller the encoding is as > soon as you tell me what you choose for {n,p}. > There's an obvious generalization here for primary.xml data as well as for all

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-15 Thread Anders F Björklund
Jeff Johnson wrote: >> I was recently looking at making a "manifest" for FreeBSD, >> which consists of a simple files listing for *each package*. >> >> ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-8.1-release/All/*.tbz >> >> I was looking at the Slackware MANIFEST as a reference, which

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 9:23 PM, Per Øyvind Karlsen wrote: > 2010/12/14 Jeff Johnson : >> >> On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: >> The issues of the size of files.xml* and synthesis.hdlist* have nothing whatsoever to do with parentdir/linkto dependencies. >>> But

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Per Øyvind Karlsen
2010/12/14 Jeff Johnson : > > On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: > >>> >>> The issues of the size of files.xml* and synthesis.hdlist* have nothing >>> whatsoever to do with parentdir/linkto dependencies. >> But for being able to resolve these dependencies, one still needs the >>

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
>> >> Google said http://techreports.lib.berkeley.edu/accessPages/CSD-83-148.html >> >> Finding Files Fast >> Authors: Woods, James A. >> Technical Report Identifier: CSD-83-148 >> January 15, 1983 >> > > Bingo. Off by a year, and the chloroxed neurons resisted > confusion with the other Jam

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 6:46 PM, Anders F Björklund wrote: > Jeff Johnson wrote: > >> There are some very simple data reductions on hierarchical >> paths too. One of the best known is >> Run a dictionary: assign an integer weighted by # of >> occurences to favor small integers for frequent

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Anders F Björklund
Jeff Johnson wrote: > There are some very simple data reductions on hierarchical > paths too. One of the best known is > Run a dictionary: assign an integer weighted by # of > occurences to favor small integers for frequently > encountered tokens between /.../ (all of "usr" and "

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote: >> >> The issues of the size of files.xml* and synthesis.hdlist* have nothing >> whatsoever to do with parentdir/linkto dependencies. > But for being able to resolve these dependencies, one still needs the > metadata of files.xml, which synt

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Per Øyvind Karlsen
2010/12/14 Jeff Johnson : > > On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote: > >> >> On a related note though I've started giving parentdir & symlink deps >> some more thoughts again though, skimming the surface on practical >> issues and drawbacks of such as ie. the size of files.xml.lzma

Re: Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Jeff Johnson
On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote: > > On a related note though I've started giving parentdir & symlink deps > some more thoughts again though, skimming the surface on practical > issues and drawbacks of such as ie. the size of files.xml.lzma in > main/release currently being

Metadata size constraints wrt. parentdir & symlink deps, Was: [CVS] RPM: rpm/ CHANGES rpm/lib/ rpmts.c

2010-12-14 Thread Per Øyvind Karlsen
2010/12/14 Jeff Johnson : > > On Dec 14, 2010, at 12:47 PM, Per Øyvind Karlsen wrote: > >> >> My insight on the matter is mainly from rpm packaging perspective, >> rather than rpm engineering itself on this though, so the >> understanding of the topic is obviously rather incomplete. ;) >> > > This