On Dec 21, 2010, at 2:26 AM, Anders F Björklund wrote:
> Jeff Johnson wrote:
>
>>> Should make it into a generic library eventually, once this prototyping
>>> is done... Amazing how many silly bitarrays and digests are out there,
>>> like using scripted byte arrays and for instance MD5, for Bloo
Jeff Johnson wrote:
>> Should make it into a generic library eventually, once this prototyping
>> is done... Amazing how many silly bitarrays and digests are out there,
>> like using scripted byte arrays and for instance MD5, for Bloom filters.
>> It'll be interesting to see how the performance do
On Dec 20, 2010, at 7:01 PM, Anders F Björklund wrote:
> Jeff Johnson wrote:
>
> Should make it into a generic library eventually, once this prototyping
> is done... Amazing how many silly bitarrays and digests are out there,
> like using scripted byte arrays and for instance MD5, for Bloom filt
Jeff Johnson wrote:
> You are already seeing that a *uncompressed* Bloom filter using conservative
> parameters like 10**-6 is comparable in size to the traditional *compressed*
> file paths. With 10**-4, and a per-package population estimate of ~50K,
> and compression on the array of Bloom filter
2010/12/18 Jeff Johnson :
>
> On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote:
>
>>
>> On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote:
>>
>>>
>>> So I guess there's something I'm not really fully grasping here...
>>>
>>> See code attached...
>>>
>>
>> Yes. You miss that you need to estimate
>
> The "bisection" likely isn't worth worrying about until there is need. But
> rpmbdUnion/rpmbfIntersect are useful operations on arrays of fixed size
> Bloom filters no matter what.
>
One last hint I forgot (re using rpmbfIntersect)
Assuming that all of the Bloom filters are fixed size, the
On Dec 17, 2010, at 2:22 PM, Jeff Johnson wrote:
>
> On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote:
>
>>
>> So I guess there's something I'm not really fully grasping here...
>>
>> See code attached...
>>
>
> Yes. You miss that you need to estimate the expected size of the
> populat
On Dec 17, 2010, at 1:48 PM, Per Øyvind Karlsen wrote:
>
> So I guess there's something I'm not really fully grasping here...
>
> See code attached...
>
Yes. You miss that you need to estimate the expected size of the
population you wish to capture in a Bloom Filter:
size_t n = 0;
2010/12/15 Jeff Johnson :
>
> On Dec 14, 2010, at 9:51 PM, Jeff Johnson wrote:
>
>>
>> Download. uncompress. use for file dependencies.
>>
>> I will take wagers on how much smaller the encoding is as
>> soon as you tell me what you choose for {n,p}.
>>
>
> There's an obvious generalization here for
On Dec 14, 2010, at 9:51 PM, Jeff Johnson wrote:
>
> Download. uncompress. use for file dependencies.
>
> I will take wagers on how much smaller the encoding is as
> soon as you tell me what you choose for {n,p}.
>
There's an obvious generalization here for primary.xml
data as well as for all
Jeff Johnson wrote:
>> I was recently looking at making a "manifest" for FreeBSD,
>> which consists of a simple files listing for *each package*.
>>
>> ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-8.1-release/All/*.tbz
>>
>> I was looking at the Slackware MANIFEST as a reference, which
On Dec 14, 2010, at 9:23 PM, Per Øyvind Karlsen wrote:
> 2010/12/14 Jeff Johnson :
>>
>> On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote:
>>
The issues of the size of files.xml* and synthesis.hdlist* have nothing
whatsoever to do with parentdir/linkto dependencies.
>>> But
2010/12/14 Jeff Johnson :
>
> On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote:
>
>>>
>>> The issues of the size of files.xml* and synthesis.hdlist* have nothing
>>> whatsoever to do with parentdir/linkto dependencies.
>> But for being able to resolve these dependencies, one still needs the
>>
>>
>> Google said http://techreports.lib.berkeley.edu/accessPages/CSD-83-148.html
>>
>> Finding Files Fast
>> Authors: Woods, James A.
>> Technical Report Identifier: CSD-83-148
>> January 15, 1983
>>
>
> Bingo. Off by a year, and the chloroxed neurons resisted
> confusion with the other Jam
On Dec 14, 2010, at 6:46 PM, Anders F Björklund wrote:
> Jeff Johnson wrote:
>
>> There are some very simple data reductions on hierarchical
>> paths too. One of the best known is
>> Run a dictionary: assign an integer weighted by # of
>> occurences to favor small integers for frequent
Jeff Johnson wrote:
> There are some very simple data reductions on hierarchical
> paths too. One of the best known is
> Run a dictionary: assign an integer weighted by # of
> occurences to favor small integers for frequently
> encountered tokens between /.../ (all of "usr" and "
On Dec 14, 2010, at 4:49 PM, Per Øyvind Karlsen wrote:
>>
>> The issues of the size of files.xml* and synthesis.hdlist* have nothing
>> whatsoever to do with parentdir/linkto dependencies.
> But for being able to resolve these dependencies, one still needs the
> metadata of files.xml, which synt
2010/12/14 Jeff Johnson :
>
> On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote:
>
>>
>> On a related note though I've started giving parentdir & symlink deps
>> some more thoughts again though, skimming the surface on practical
>> issues and drawbacks of such as ie. the size of files.xml.lzma
On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote:
>
> On a related note though I've started giving parentdir & symlink deps
> some more thoughts again though, skimming the surface on practical
> issues and drawbacks of such as ie. the size of files.xml.lzma in
> main/release currently being
2010/12/14 Jeff Johnson :
>
> On Dec 14, 2010, at 12:47 PM, Per Øyvind Karlsen wrote:
>
>>
>> My insight on the matter is mainly from rpm packaging perspective,
>> rather than rpm engineering itself on this though, so the
>> understanding of the topic is obviously rather incomplete. ;)
>>
>
> This
20 matches
Mail list logo