Re: [Collections] Bloom filters status?

2023-03-23 Thread Gary Gregory
Thank you for the update Claude. Gary On Thu, Mar 23, 2023, 03:57 Claude Warren, Jr wrote: > Gary, > > The Bloom filter package is functionally complete and ready for prime > time. As to beta, I have used it in several research projects and am > proposing using it in an in-house project at

RE: [Collections] Bloom filters status?

2023-03-23 Thread Claude Warren, Jr
Gary, The Bloom filter package is functionally complete and ready for prime time. As to beta, I have used it in several research projects and am proposing using it in an in-house project at work. I don't know what the standard process is for commons, so I leave it to your discretion. Claude

[Collections] Bloom filters status?

2023-03-17 Thread Gary Gregory
Hi All, What is the status of the bloom filters package? Is it ready for prime time? Should we have a beta? Gary

Re: [collections] Bloom filters

2020-03-14 Thread Alex Herbert
> On 14 Mar 2020, at 17:02, Claude Warren wrote: > > I am confused. > > If we restrict what Shape stores to "m" and "t" we can provide methods to > estimate "n" and "p" for the estimated "n" > > If the constructors for Shape remains as as they are then developers don't > need to go online to

Re: [collections] Bloom filters

2020-03-14 Thread Claude Warren
I am confused. If we restrict what Shape stores to "m" and "t" we can provide methods to estimate "n" and "p" for the estimated "n" If the constructors for Shape remains as as they are then developers don't need to go online to get numbers to plug into Shape. They only need to go online if they

Re: [collections] Bloom filters

2020-03-14 Thread Alex Herbert
Should we not provide an API to compute this in code. Otherwise every developer who uses the code will have to go to the online Bloom filter calculator to check. They cannot do this dynamically in a program. If the equations are standard then I think we should provide a calculator. And Shape does

Re: [collections] Bloom filters

2020-03-14 Thread Claude Warren
Shape is not intended to "Perform the standard computations using some of n, m, k, p to produce optimal values for the other values of n, m, k, p:" that is left to the developer to determine possibly with the help of https://hur.st/bloomfilter/ as referenced in the class javadoc. However, writing

Re: [collections] Bloom filters

2020-03-14 Thread Alex Herbert
> On 10 Mar 2020, at 01:36, Alex Herbert wrote: > > I was going to modify the BloomFilter interface as discussed on this thread. > > However before that I have made some changes to the current code to tidy it > up. PR is [1]. The changes target the following: > > [1]

Re: [collections] Bloom filters

2020-03-09 Thread Alex Herbert
I was going to modify the BloomFilter interface as discussed on this thread. However before that I have made some changes to the current code to tidy it up. PR is [1]. The changes target the following: 1. Test coverage. Coverage is now close to 100%. The AbstractBloomFilterTest was modified to

Re: [collections] Bloom filters

2020-03-08 Thread Claude Warren
With the upcoming change the StaticHash usage model has changed. It was serving two purposes: 1. as a mechanism to preserve the list of integers from the BloomFilter as well as the shape. 2. as a way to construct a Hasher from a collection of integers and a shape so that they could

Re: [collections] Bloom filters

2020-03-07 Thread Alex Herbert
> On 6 Mar 2020, at 02:14, Alex Herbert wrote: > > The change to make the CountingBloomFilter an interface is in this PR [1]. Claude has stated in a review of the PR on GitHub that the change to CountingBloomFilter as an interface is good. I will now progress to updating the BloomFilter

Re: [collections] Bloom filters

2020-03-05 Thread Alex Herbert
The change to make the CountingBloomFilter an interface is in this PR [1]. This renames the existing CountingBloomFilter to MapCountingBloomFilter. I’ve left this alone and just done my work on new files: - CountingBloomFilter: An interface to extend BloomFilter - ArrayCountingBloomFilter: An

Re: [collections] Bloom filters

2020-03-03 Thread Alex Herbert
> On 3 Mar 2020, at 22:31, Claude Warren wrote: > > take a look at > https://github.com/apache/commons-collections/pull/131/files#diff-8b2bf046dc35c88908eef196937173e1 > > This is a different Hasher with a smaller data footprint than > DynamicHasher. Like the DynamicHasher it builds the

Re: [collections] Bloom filters

2020-03-03 Thread Claude Warren
take a look at https://github.com/apache/commons-collections/pull/131/files#diff-8b2bf046dc35c88908eef196937173e1 This is a different Hasher with a smaller data footprint than DynamicHasher. Like the DynamicHasher it builds the values on the fly. Can SplitIterators be implemented on them easily?

Re: [collections] Bloom filters

2020-03-03 Thread Alex Herbert
On 02/03/2020 22:34, Claude Warren wrote: So what we have then is: *public* *interface* BloomFilter { *int* andCardinality(BloomFilter other); *int* cardinality(); *boolean* contains(BloomFilter other); *boolean* contains(Hasher hasher); *long*[] getBits();

Re: [collections] Bloom filters

2020-03-02 Thread Alex Herbert
> On 2 Mar 2020, at 22:34, Claude Warren wrote: > > So what we have then is: > > *public* *interface* BloomFilter { > > *int* andCardinality(BloomFilter other); > > *int* cardinality(); > > *boolean* contains(BloomFilter other); > > *boolean* contains(Hasher hasher); > >

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
So what we have then is: *public* *interface* BloomFilter { *int* andCardinality(BloomFilter other); *int* cardinality(); *boolean* contains(BloomFilter other); *boolean* contains(Hasher hasher); *long*[] getBits(); // Change PrimitiveIterator.OfInt

Re: [collections] Bloom filters

2020-03-02 Thread Alex Herbert
On 02/03/2020 16:12, Claude Warren wrote: Does getCounts() return a snapshot of the values when the call was made or does it return values that may be updated during the retrieval. If there are 2 threads (one reading counts and one doing a merge) it seems to me that the "iterate over the data

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
Does getCounts() return a snapshot of the values when the call was made or does it return values that may be updated during the retrieval. If there are 2 threads (one reading counts and one doing a merge) it seems to me that the "iterate over the data without constructing objects" approach means

Re: [collections] Bloom filters

2020-03-02 Thread Gilles Sadowski
Hello. Le lun. 2 mars 2020 à 14:19, Manas Kalangan a écrit : > > hey guys , i am manas , 2nd year computer engineering student, this is my > first time in GSoC, could someone help me with project idea? Welcome at the Commons project's discussion forum, but please do not hijack threads.[1] Best

Re: [collections] Bloom filters

2020-03-02 Thread Manas Kalangan
hey guys , i am manas , 2nd year computer engineering student, this is my first time in GSoC, could someone help me with project idea? On Mon, Mar 2, 2020 at 6:42 PM Alex Herbert wrote: > > On 02/03/2020 11:32, Claude Warren wrote: > > my thought on changing the BloomFilter.merge() to return a

Re: [collections] Bloom filters

2020-03-02 Thread Alex Herbert
On 02/03/2020 11:32, Claude Warren wrote: my thought on changing the BloomFilter.merge() to return a boolean is along the lines you had: return true on successful merge (even if there are no changes in the enabled bits). And yes, for most cases the standard bloom filter will return true, but

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
my thought on changing the BloomFilter.merge() to return a boolean is along the lines you had: return true on successful merge (even if there are no changes in the enabled bits). And yes, for most cases the standard bloom filter will return true, but the change is really to support extensions to

Re: [collections] Bloom filters

2020-03-02 Thread Alex Herbert
On 02/03/2020 09:38, Claude Warren wrote: It is not too late to update the BloomFIlter interface to have the merge return a boolean. The incorrect Shape would still throw an exception, so the return value would only come into play if the bits could not be set. thoughts? I don't see the harm

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
It is not too late to update the BloomFIlter interface to have the merge return a boolean. The incorrect Shape would still throw an exception, so the return value would only come into play if the bits could not be set. thoughts? On Mon, Mar 2, 2020 at 7:56 AM Claude Warren wrote: > for the

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
for the remove(), add(), and subtract() methods I agree that void is not correct and it should be boolean and be the same as the value you would get from calling isValid(). You are correct the getCounts() should return an iterator of some type on int[], I don't know why I thought long[]. I am

Re: [collections] Bloom filters

2020-03-01 Thread Alex Herbert
> On 1 Mar 2020, at 15:39, Claude Warren wrote: > > I think the CountingBloomFilter interface needs to extend BloomFilter. I said that but did not write it, sorry. > > I think I am confused. > > I would expect CountingBloomFilter to have > > interface CountingBloomFilter extends

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
I think the CountingBloomFilter interface needs to extend BloomFilter. I think I am confused. I would expect CountingBloomFilter to have interface CountingBloomFilter extends BloomFilter { // these 2 methods are the reverse of merge() void remove( BloomFilter ); void remove( Hasher

Re: [collections] Bloom filters

2020-03-01 Thread Alex Herbert
> On 1 Mar 2020, at 09:28, Claude Warren wrote: > > The idea of a backing array is fine and the only problem I see with it is > in very large filters (on the order of 10^8 bits and larger) but document > the size calculation and let the developer worry about it. Let us look at the use case

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
The idea of a backing array is fine and the only problem I see with it is in very large filters (on the order of 10^8 bits and larger) but document the size calculation and let the developer worry about it. As for the merge question. merge is a standard bloom filter operation. It is well

Re: [collections] Bloom filters

2020-02-29 Thread Alex Herbert
> On 29 Feb 2020, at 07:46, Claude Warren wrote: > > Alex would you take a look at pull request 131 [1]. it adds a new hasher > implementation and makes the HashFunctionValidator available for public use. > > https://github.com/apache/commons-collections/pull/131 >

Re: [collections] Bloom filters

2020-02-28 Thread Claude Warren
Alex would you take a look at pull request 131 [1]. it adds a new hasher implementation and makes the HashFunctionValidator available for public use. https://github.com/apache/commons-collections/pull/131 On Tue, Feb 25, 2020 at 12:35 AM Alex Herbert wrote: > I have created a PR that contains

Re: [collections] Bloom filters

2020-02-24 Thread Alex Herbert
I have created a PR that contains most of the changes discussed in this thread (see [1]). Please review the changes and comment here or on GitHub. I have left the CountingBloomFilter alone. I will reimplement the add/subtract functionality as discussed into another PR. Alex [1]

Re: [collections] Bloom filters

2020-02-19 Thread Claude Warren
Way back in your first post on this topic you talked about changing the AbstractBloomFilter to use: @Override public int cardinality() { int count = 0; for (final long bits : getBits()) { count += Long.bitCount(bits); } return count;

Re: [collections] Bloom filters

2020-02-19 Thread Alex Herbert
> On 19 Feb 2020, at 21:14, Claude Warren wrote: > > I think the compromise solution of removing the thrown exception and adding > an isInvalid() method makes sense and fits within the parameters of how the > counting Bloom filter should work. However, I think the add() and remove() >

Re: [collections] Bloom filters

2020-02-19 Thread Claude Warren
I think the compromise solution of removing the thrown exception and adding an isInvalid() method makes sense and fits within the parameters of how the counting Bloom filter should work. However, I think the add() and remove() methods should (or add and subtract) should only accept

Re: [collections] Bloom filters

2020-02-18 Thread Alex Herbert
> On 18 Feb 2020, at 22:34, Gary Gregory wrote: > > On Tue, Feb 18, 2020 at 5:32 PM Claude Warren wrote: > >> Last one first, why a tree map? I think it is a holdover from an earlier >> implementation. It can be any reasonable Map (e.g. HashMap). >> >> on the remove() issue. You are

Re: [collections] Bloom filters

2020-02-18 Thread Gary Gregory
On Tue, Feb 18, 2020 at 5:32 PM Claude Warren wrote: > Last one first, why a tree map? I think it is a holdover from an earlier > implementation. It can be any reasonable Map (e.g. HashMap). > > on the remove() issue. You are absolutely correct, there is a bug. I > May you please add a test

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
Last one first, why a tree map? I think it is a holdover from an earlier implementation. It can be any reasonable Map (e.g. HashMap). on the remove() issue. You are absolutely correct, there is a bug. I looks like a partial edit got restored back into the code. The first block should be

Re: [collections] Bloom filters

2020-02-18 Thread Alex Herbert
I've updated master with some of the fixes discussed. I've been looking through the rest of the BloomFilter code and the CountingBloomFilter appears to be broken: 1. When another CountingBloomFiler is merged or removed the counts of the other filter are ignored. This is not as documented.

Re: [collections] Bloom filters

2020-02-18 Thread Alex Herbert
On 18/02/2020 09:54, Claude Warren wrote: On Tue, Feb 18, 2020 at 9:12 AM Alex Herbert wrote: My maths is rusty. If A=0xF000ABCD as interpreted as an unsigned and B=0xF000ABCD but interpreted as a signed does (A mod N) = (B mod N) for all positive values of N? If so then you are correct

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
On Tue, Feb 18, 2020 at 9:12 AM Alex Herbert wrote: > > My maths is rusty. If A=0xF000ABCD as interpreted as an unsigned and > > B=0xF000ABCD but interpreted as a signed does (A mod N) = (B mod N) for > all > > positive values of N? If so then you are correct and Signedness does not > >

Re: [collections] Bloom filters

2020-02-18 Thread Alex Herbert
> On 18 Feb 2020, at 08:02, Claude Warren wrote: > > On Mon, Feb 17, 2020 at 9:52 PM Alex Herbert > > wrote: > >> >> >>> On 17 Feb 2020, at 20:30, Claude Warren >> > wrote: >>> >>> Alex, >>> >>> Thank you for your comments. >>>

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
On Mon, Feb 17, 2020 at 9:52 PM Alex Herbert wrote: > > > > On 17 Feb 2020, at 20:30, Claude Warren wrote: > > > > Alex, > > > > Thank you for your comments. > > > > See comments inline. > > > > > > > > On Mon, Feb 17, 2020 at 3:20 PM Alex Herbert > > > wrote:

Re: [collections] Bloom filters

2020-02-17 Thread Alex Herbert
> On 17 Feb 2020, at 20:30, Claude Warren wrote: > > Alex, > > Thank you for your comments. > > See comments inline. > > > > On Mon, Feb 17, 2020 at 3:20 PM Alex Herbert > > wrote: > >> I had a look through all the BloomFilter code. Thanks Claude for the

Re: [collections] Bloom filters

2020-02-17 Thread Claude Warren
Alex, Thank you for your comments. See comments inline. On Mon, Feb 17, 2020 at 3:20 PM Alex Herbert wrote: > I had a look through all the BloomFilter code. Thanks Claude for the > contribution. > > Some items that need clarifying: > > > 1. HashFunctionIdentity.Signedness > > This is not

[collections] Bloom filters

2020-02-17 Thread Alex Herbert
I had a look through all the BloomFilter code. Thanks Claude for the contribution. Some items that need clarifying: 1. HashFunctionIdentity.Signedness This is not fully documented as to what the sign applies to. There is no javadoc on the enum values for SIGNED and UNSIGNED. The current

Re: [Collections] Bloom filters are in

2020-01-25 Thread Gary Gregory
On Sat, Jan 25, 2020 at 9:57 AM Claude Warren wrote: > Now to get some documentation done! > Looking forward to it :-) Gary > > On Sat, Jan 25, 2020 at 1:58 PM Gary Gregory > wrote: > > > ... git master. Thank you Claude! > > > > Gary > > > > > -- > I like: Like Like - The likeliest place

Re: [Collections] Bloom filters are in

2020-01-25 Thread Claude Warren
Now to get some documentation done! On Sat, Jan 25, 2020 at 1:58 PM Gary Gregory wrote: > ... git master. Thank you Claude! > > Gary > -- I like: Like Like - The likeliest place on the web LinkedIn: http://www.linkedin.com/in/claudewarren

[Collections] Bloom filters are in

2020-01-25 Thread Gary Gregory
... git master. Thank you Claude! Gary

Re: [collections] bloom filters comments

2020-01-13 Thread Claude Warren
Gary suggested merging the pull request, Gilles points out there were issues raised on the mailing list, I posted what I think was an accurate summary. What needs to be done to resolve this so that a final decision on the pull request can be made? Claude On Thu, Jan 9, 2020 at 2:57 AM Bruno P.

Re: [collections] bloom filters comments

2020-01-08 Thread Bruno P. Kinoshita
Sorry, I'd read Gary's reply as if there was no PR yet. Reviewed it a bit now, lots of tests! Will play with the code and read the comments and finish the review by the end of the week. Thanks Claude On Thursday, 9 January 2020, 11:20:40 am NZDT, Claude Warren wrote: There is a

Re: [collections] bloom filters comments

2020-01-08 Thread Claude Warren
There is a pull request open: https://github.com/apache/commons-collections/pull/83 On Wed, Jan 8, 2020 at 9:32 PM Bruno P. Kinoshita wrote: > Thanks for the summary Claude. I read some of the messages, but got lost > in the middle of the discussion, and tend to understand problems better if >

Re: [collections] bloom filters comments

2020-01-08 Thread Bruno P. Kinoshita
Thanks for the summary Claude. I read some of the messages, but got lost in the middle of the discussion, and tend to understand problems better if there's some code to read along. And I am used to GitHub/GitLab diff interface. So I agree with Gary that this could be a good time for a PR (maybe

Re: [collections] bloom filters comments

2020-01-08 Thread Claude Warren
I believe the issue (I think history is at https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17003600) is about the identification of hash implementations. Currently there are a couple of classes involved: Hasher

Re: [collections] bloom filters comments

2020-01-08 Thread Gilles Sadowski
Le mer. 8 janv. 2020 à 15:15, Gary Gregory a écrit : > > I think it is time to bring this PR in and make any adjustments within > master beyond that. This will be quicker and simpler than going round and > round for simple things like Javadoc tweaks and small non-functional > changes (formatting,

Re: [collections] bloom filters comments

2020-01-08 Thread Gary Gregory
I think it is time to bring this PR in and make any adjustments within master beyond that. This will be quicker and simpler than going round and round for simple things like Javadoc tweaks and small non-functional changes (formatting, variable names, and so on.) I'll proceed with that tonight.

Re: [collections] bloom filters comments

2019-12-29 Thread Claude Warren
It is currently a sub-class. There was a suggestion to move it outside of the BloomFilter interface. On Sun, Dec 29, 2019 at 3:47 PM Gilles Sadowski wrote: > Le dim. 29 déc. 2019 à 15:30, Claude Warren a écrit : > > > > If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter

Re: [collections] bloom filters comments

2019-12-29 Thread Gilles Sadowski
Le dim. 29 déc. 2019 à 15:30, Claude Warren a écrit : > > If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter > interface and made a stand-alone class would the name change or is the fact > that it is in the o.a.c.c.bloomfilter package enough? > > I prefer the name Shape to

Re: [collections] bloom filters comments

2019-12-29 Thread Claude Warren
If the Shape class (BloomFilter.Shape) is extracted from the BloomFilter interface and made a stand-alone class would the name change or is the fact that it is in the o.a.c.c.bloomfilter package enough? I prefer the name Shape to BloomFilterShape. Claude On Sat, Dec 28, 2019 at 9:06 AM Claude

Re: [collections] bloom filters comments

2019-12-28 Thread Claude Warren
Once the interface is extracted and reduced to the minimum necessary the following methods are removed: orCardinality() -- we have andCardinality() and xorCardinality() this was included for completeness. isFull() -- true if all the bits in the vector are on. A convenience method used to short

Re: [collections] bloom filters comments

2019-12-27 Thread Gary Gregory
On Fri, Dec 27, 2019 at 11:36 AM Claude Warren wrote: > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory > wrote: > > > Hi Claude and all: > > > > Here are a couple of comments on the bloom filter PR. > > > > 10,100 ft level comment we do not have to worry about today: Before the > > release, we

Re: [collections] bloom filters comments

2019-12-27 Thread Claude Warren
On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory wrote: > Hi Claude and all: > > Here are a couple of comments on the bloom filter PR. > > 10,100 ft level comment we do not have to worry about today: Before the > release, we might want to split Commons Collection into a multi-module > project and

[collections] bloom filters comments

2019-12-26 Thread Gary Gregory
Hi Claude and all: Here are a couple of comments on the bloom filter PR. 10,100 ft level comment we do not have to worry about today: Before the release, we might want to split Commons Collection into a multi-module project and have the BF code in it own module. The _main_ reason for this is