Re: Do we want a hashset type?

2023-08-14 Thread jian he
https://github.com/tvondra/hashset On Mon, Aug 14, 2023 at 11:23 PM Florents Tselai wrote: > > Has anyone put this in a git repo / extension package or similar ? > > I’d like to try it out outside the core pg tree. > > > On 1 Jul 2023, at 12:04 PM, Joel Jacobson wrote: > > > > On Fri, Jun 30,

Re: Do we want a hashset type?

2023-08-14 Thread Florents Tselai
Has anyone put this in a git repo / extension package or similar ? I’d like to try it out outside the core pg tree. > On 1 Jul 2023, at 12:04 PM, Joel Jacobson wrote: > > On Fri, Jun 30, 2023, at 06:50, jian he wrote: >> more like a C questions >> in this context does >> #define

Re: Do we want a hashset type?

2023-07-01 Thread Joel Jacobson
On Fri, Jun 30, 2023, at 06:50, jian he wrote: > more like a C questions > in this context does > #define HASHSET_GET_VALUES(set) ((int32 *) ((set)->data + > CEIL_DIV((set)->capacity, 8))) > define first, then define struct int4hashset_t. Is this normally ok? Yes, it's fine. Macros are just text

Re: Do we want a hashset type?

2023-06-29 Thread jian he
On Thu, Jun 29, 2023 at 4:43 PM Joel Jacobson wrote: > > On Thu, Jun 29, 2023, at 08:54, jian he wrote: > > Anyway, this time, I added another macro,which seems to simplify the code. > > > > #define SET_DATA_PTR(a) \ > > (((char *) (a->data)) + CEIL_DIV(a->capacity, 8)) > > > > it passed all the

Re: Do we want a hashset type?

2023-06-29 Thread jian he
On Wed, Jun 28, 2023 at 4:50 PM Joel Jacobson wrote: > > On Wed, Jun 28, 2023, at 08:26, jian he wrote: > > > Hi there. > > I changed the function hashset_contains to strict. > > Changing hashset_contains to STRICT would cause it to return NULL > if any of the operands are NULL, which I don't

Re: Do we want a hashset type?

2023-06-28 Thread Joel Jacobson
On Wed, Jun 28, 2023, at 08:26, jian he wrote: > Hi there. > I changed the function hashset_contains to strict. Changing hashset_contains to STRICT would cause it to return NULL if any of the operands are NULL, which I don't believe is correct, since: SELECT NULL = ANY('{}'::int4[]); ?column?

Re: Do we want a hashset type?

2023-06-28 Thread jian he
On Tue, Jun 27, 2023 at 4:27 PM Joel Jacobson wrote: > > On Tue, Jun 27, 2023, at 04:35, jian he wrote: > > in SQLMultiSets.pdf(previously thread) I found a related explanation > > on page 45, 46. > > /home/jian/hashset/0001-make-int4hashset_contains-strict-and-header-file-change.patch > > (CASE

Re: Do we want a hashset type?

2023-06-27 Thread Joel Jacobson
On Tue, Jun 27, 2023, at 10:26, Joel Jacobson wrote: > Attachments: > * hashset-0.0.1-b7e5614-full.patch > * hashset-0.0.1-b7e5614-incremental.patch To help verify that the semantics, I thought it might be helpful to provide a comprehensive set of examples that tries to cover all different ways

Re: Do we want a hashset type?

2023-06-26 Thread jian he
On Tue, Jun 27, 2023 at 4:55 AM Joel Jacobson wrote: > > On Mon, Jun 26, 2023, at 13:06, jian he wrote: > > Can you try to glue the attached to the hashset data type input > > function. > > the attached will parse cstring with double quote and not. so '{1,2,3}' > > == '{"1","2","3"}'. obviously

Re: Do we want a hashset type?

2023-06-26 Thread Kirk Wolak
On Mon, Jun 26, 2023 at 4:55 PM Joel Jacobson wrote: > On Mon, Jun 26, 2023, at 13:06, jian he wrote: > > Can you try to glue the attached to the hashset data type input > > function. > > the attached will parse cstring with double quote and not. so '{1,2,3}' > > == '{"1","2","3"}'. obviously

Re: Do we want a hashset type?

2023-06-26 Thread Joel Jacobson
On Mon, Jun 26, 2023, at 13:06, jian he wrote: > Can you try to glue the attached to the hashset data type input > function. > the attached will parse cstring with double quote and not. so '{1,2,3}' > == '{"1","2","3"}'. obviously quote will preserve the inner string as > is. > currently

Re: Do we want a hashset type?

2023-06-26 Thread jian he
On Mon, Jun 26, 2023 at 4:36 AM Joel Jacobson wrote: > > On Sun, Jun 25, 2023, at 11:42, Joel Jacobson wrote: > > SELECT hashset_contains('{}'::int4hashset, NULL::int); > > > > would be False, according to the General Rules. > > > ... > > Applying the same rules, we'd have to return Unknown

Re: Do we want a hashset type?

2023-06-25 Thread jian he
On Mon, Jun 26, 2023 at 2:56 AM Tomas Vondra wrote: > > > > On 6/25/23 15:32, jian he wrote: > >> Or maybe I just don't understand the proposal. Perhaps it'd be best if > >> jian wrote a patch illustrating the idea, and showing how it performs > >> compared to the current approach. > > > >

Re: Do we want a hashset type?

2023-06-25 Thread Joel Jacobson
On Sun, Jun 25, 2023, at 11:42, Joel Jacobson wrote: > SELECT hashset_contains('{}'::int4hashset, NULL::int); > > would be False, according to the General Rules. > ... > Applying the same rules, we'd have to return Unknown (which we represent as > null) for: > > SELECT

Re: Do we want a hashset type?

2023-06-25 Thread Tomas Vondra
On 6/25/23 15:32, jian he wrote: >> Or maybe I just don't understand the proposal. Perhaps it'd be best if >> jian wrote a patch illustrating the idea, and showing how it performs >> compared to the current approach. > > currently joel's idea is a int4hashset. based on the code first tomas

Re: Do we want a hashset type?

2023-06-25 Thread jian he
> Or maybe I just don't understand the proposal. Perhaps it'd be best if > jian wrote a patch illustrating the idea, and showing how it performs > compared to the current approach. currently joel's idea is a int4hashset. based on the code first tomas wrote. it looks like a non-nested an

Re: Do we want a hashset type?

2023-06-25 Thread Joel Jacobson
On Sat, Jun 24, 2023, at 21:16, Joel Jacobson wrote: > New version of int4hashset_contains() that should follow the same > General Rules as MULTISET's MEMBER OF (8.16 ). ... > SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false ... > SELECT hashset_contains('{null}'::int4hashset,

Re: Do we want a hashset type?

2023-06-24 Thread Joel Jacobson
New version of int4hashset_contains() that should follow the same General Rules as MULTISET's MEMBER OF (8.16 ). The first rule is to return False if the cardinality is 0 (zero). However, we must first check if the first argument is null, in which case the cardinality cannot be 0 (zero), so if

Re: Do we want a hashset type?

2023-06-24 Thread Joel Jacobson
On Thu, Jun 22, 2023, at 07:51, Joel Jacobson wrote: > For instance, how should hashset_count() work? > > Given the query, > > SELECT hashset_count('{1,2,3,null}'::int4hashset); > > Should we, > > a) threat NULL as a distinct value and return 4? > > b) ignore NULL and return 3? > > c) return NULL?

Re: Do we want a hashset type?

2023-06-23 Thread Tomas Vondra
On 6/23/23 13:47, Andrew Dunstan wrote: > > On 2023-06-23 Fr 04:23, Joel Jacobson wrote: >> On Fri, Jun 23, 2023, at 08:40, jian he wrote: >>> I played around array_func.c >>> many of the code can be used for multiset data type. >>> now I imagine multiset as something like one dimension array.

Re: Do we want a hashset type?

2023-06-23 Thread Andrew Dunstan
On 2023-06-23 Fr 04:23, Joel Jacobson wrote: On Fri, Jun 23, 2023, at 08:40, jian he wrote: I played around array_func.c many of the code can be used for multiset data type. now I imagine multiset as something like one dimension array. (nested is somehow beyond the imagination...). Are you

Re: Do we want a hashset type?

2023-06-23 Thread jian he
On Fri, Jun 23, 2023 at 4:23 PM Joel Jacobson wrote: > On Fri, Jun 23, 2023, at 08:40, jian he wrote: > > I played around array_func.c > > many of the code can be used for multiset data type. > > now I imagine multiset as something like one dimension array. (nested > > is somehow beyond the

Re: Do we want a hashset type?

2023-06-23 Thread Joel Jacobson
On Fri, Jun 23, 2023, at 08:40, jian he wrote: > I played around array_func.c > many of the code can be used for multiset data type. > now I imagine multiset as something like one dimension array. (nested > is somehow beyond the imagination...). Are you suggesting it might be a better idea to

Re: Do we want a hashset type?

2023-06-23 Thread jian he
I played around array_func.c many of the code can be used for multiset data type. now I imagine multiset as something like one dimension array. (nested is somehow beyond the imagination...). * A standard varlena array has the following internal structure: *- standard varlena header word *

Re: Do we want a hashset type?

2023-06-22 Thread Tomas Vondra
On 6/22/23 19:52, Joel Jacobson wrote: > On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: >> This is also what the SQL standard does for multisets - there's SQL:20nn >> draft at http://www.wiscorp.com/SQLStandards.html, and the > predicate> section (p. 475) explains how this should work with

Re: Do we want a hashset type?

2023-06-22 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: > This is also what the SQL standard does for multisets - there's SQL:20nn > draft at http://www.wiscorp.com/SQLStandards.html, and the predicate> section (p. 475) explains how this should work with NULL. I've looked again at the paper you

Re: Do we want a hashset type?

2023-06-21 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 18:25, Tomas Vondra wrote: > On 6/20/23 16:56, Joel Jacobson wrote: >> The reference to consistency with what we do elsewhere might not be entirely >> applicable in this context, since the set feature we're designing is a new >> beast >> in the SQL landscape. > > I don't

Re: Do we want a hashset type?

2023-06-20 Thread Tomas Vondra
On 6/20/23 20:08, jian he wrote: > On Wed, Jun 21, 2023 at 12:25 AM Tomas Vondra > ... >> http://www.wiscorp.com/sqlmultisets.zip > >> Conceptually, a multiset is an unordered collection of elements, all of the >> same type, with dupli- >> cates permitted. Unlike arrays, a multiset is an

Re: Do we want a hashset type?

2023-06-20 Thread jian he
On Wed, Jun 21, 2023 at 12:25 AM Tomas Vondra wrote: > > > > On 6/20/23 16:56, Joel Jacobson wrote: > > On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: > >> On 6/20/23 12:59, Joel Jacobson wrote: > >>> On Mon, Jun 19, 2023, at 02:00, jian he wrote: > select

Re: Do we want a hashset type?

2023-06-20 Thread Tomas Vondra
On 6/20/23 16:56, Joel Jacobson wrote: > On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: >> On 6/20/23 12:59, Joel Jacobson wrote: >>> On Mon, Jun 19, 2023, at 02:00, jian he wrote: select hashset_contains('{1,2}'::int4hashset,NULL::int); should return null? >>> >>> I agree, it

Re: Do we want a hashset type?

2023-06-20 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 16:56, Joel Jacobson wrote: > I think we have an opportunity here to innovate and potentially influence a > future set concept in the SQL standard. Adding to my previous note - If there's a worry about future SQL standards introducing SETs with NULLs, causing

Re: Do we want a hashset type?

2023-06-20 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote: > On 6/20/23 12:59, Joel Jacobson wrote: >> On Mon, Jun 19, 2023, at 02:00, jian he wrote: >>> select hashset_contains('{1,2}'::int4hashset,NULL::int); >>> should return null? >> >> I agree, it should. >> >> I've now changed all functions

Re: Do we want a hashset type?

2023-06-20 Thread Tomas Vondra
On 6/20/23 14:10, Tomas Vondra wrote: > ... > > This is also what the SQL standard does for multisets - there's SQL:20nn > draft at http://www.wiscorp.com/SQLStandards.html, and the predicate> section (p. 475) explains how this should work with NULL. > BTW I just notices there's also a

Re: Do we want a hashset type?

2023-06-20 Thread Tomas Vondra
On 6/20/23 12:59, Joel Jacobson wrote: > On Mon, Jun 19, 2023, at 02:00, jian he wrote: >> select hashset_contains('{1,2}'::int4hashset,NULL::int); >> should return null? > > I agree, it should. > > I've now changed all functions except int4hashset() (the init function) > and the aggregate

Re: Do we want a hashset type?

2023-06-20 Thread Joel Jacobson
On Mon, Jun 19, 2023, at 02:00, jian he wrote: > select hashset_contains('{1,2}'::int4hashset,NULL::int); > should return null? I agree, it should. I've now changed all functions except int4hashset() (the init function) and the aggregate functions to be STRICT. I think this patch is OK to send

Re: Do we want a hashset type?

2023-06-19 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 02:04, Tomas Vondra wrote: > For UPDATE, it'd be pretty clear too, I think. It's possible to do > >UPDATE table SET col = SET[1,2,3] > > and it's clear the first is the command SET, while the second is a set > constructor. For SELECT there'd be conflict, and for ALTER

Re: Do we want a hashset type?

2023-06-19 Thread Tomas Vondra
On 6/20/23 00:50, Joel Jacobson wrote: > On Mon, Jun 19, 2023, at 14:59, Tomas Vondra wrote: >> What unexpected issues you mean? Sure, if someone uses multisets as if >> they were sets (so ignoring the handling of duplicates), things will go >> booom! quickly. > > The unexpected issues I had

Re: Do we want a hashset type?

2023-06-19 Thread Joel Jacobson
On Mon, Jun 19, 2023, at 14:59, Tomas Vondra wrote: > What unexpected issues you mean? Sure, if someone uses multisets as if > they were sets (so ignoring the handling of duplicates), things will go > booom! quickly. The unexpected issues I had in mind are subtle bugs due to treating multisets as

Re: Do we want a hashset type?

2023-06-19 Thread Tom Lane
Andrew Dunstan writes: > Yes, Multisets (a.k.a. bags and a large number of other names) would be > interesting. But I wouldn't like to abandon pure sets either. Maybe a > typmod indicating the allowed multiplicity of the type? I don't think trying to use typmod to carry fundamental semantic

Re: Do we want a hashset type?

2023-06-19 Thread Tomas Vondra
On 6/19/23 13:33, Joel Jacobson wrote: > On Mon, Jun 19, 2023, at 11:21, Tomas Vondra wrote: >> AFAICS the standard only defines arrays and multisets. Arrays are pretty >> much the thing we have, including the ARRAY[] constructor etc. Multisets >> are similar to hashset discussed here, except

Re: Do we want a hashset type?

2023-06-19 Thread Tomas Vondra
On 6/19/23 13:50, Andrew Dunstan wrote: > > On 2023-06-19 Mo 05:21, Tomas Vondra wrote: >> On 6/18/23 18:45, Andrew Dunstan wrote: >>> On 2023-06-16 Fr 20:38, Joel Jacobson wrote: New patch is attached, which will henceforth always be a complete patch, to avoid the hassle of having to

Re: Do we want a hashset type?

2023-06-19 Thread Joel Jacobson
On Mon, Jun 19, 2023, at 11:49, jian he wrote: > hashset_to_array function should be strict? > > I noticed hashset_symmetric_difference and hashset_difference handle > null in a different way, seems they should handle null in a consistent > way? Yes, I agree, they should be consistent. I've

Re: Do we want a hashset type?

2023-06-19 Thread Andrew Dunstan
On 2023-06-19 Mo 05:21, Tomas Vondra wrote: On 6/18/23 18:45, Andrew Dunstan wrote: On 2023-06-16 Fr 20:38, Joel Jacobson wrote: New patch is attached, which will henceforth always be a complete patch, to avoid the hassle of having to assemble incremental patches. Cool, thanks. It might

Re: Do we want a hashset type?

2023-06-19 Thread Joel Jacobson
On Mon, Jun 19, 2023, at 11:21, Tomas Vondra wrote: > AFAICS the standard only defines arrays and multisets. Arrays are pretty > much the thing we have, including the ARRAY[] constructor etc. Multisets > are similar to hashset discussed here, except that it tracks the number > of elements for each

Re: Do we want a hashset type?

2023-06-19 Thread jian he
On Mon, Jun 19, 2023 at 2:51 PM Joel Jacobson wrote: > > On Mon, Jun 19, 2023, at 02:00, jian he wrote: > > select hashset_contains('{1,2}'::int4hashset,NULL::int); > > should return null? > > Hmm, that's a good philosophical question. > > I notice Tomas Vondra in the initial commit opted for

Re: Do we want a hashset type?

2023-06-19 Thread Tomas Vondra
On 6/18/23 18:45, Andrew Dunstan wrote: > > On 2023-06-16 Fr 20:38, Joel Jacobson wrote: >> >> New patch is attached, which will henceforth always be a complete patch, >> to avoid the hassle of having to assemble incremental patches. > > > Cool, thanks. > It might still be convenient to

Re: Do we want a hashset type?

2023-06-19 Thread Joel Jacobson
On Mon, Jun 19, 2023, at 02:00, jian he wrote: > select hashset_contains('{1,2}'::int4hashset,NULL::int); > should return null? Hmm, that's a good philosophical question. I notice Tomas Vondra in the initial commit opted for allowing NULL inputs, treating them as empty sets, e.g. in

Re: Do we want a hashset type?

2023-06-18 Thread jian he
On Sat, Jun 17, 2023 at 8:38 AM Joel Jacobson wrote: > > On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote: > > I realise int4hashset_hash() is broken, > > since two int4hashset's that are considered equal, > > can by coincidence get different hashes: > ... > > Do we have any ideas on how to

Re: Do we want a hashset type?

2023-06-18 Thread Joel Jacobson
On Sun, Jun 18, 2023, at 18:45, Andrew Dunstan wrote: > . It might be worth sending a version number with the send function > (c.f. jsonb_send / jsonb_recv). That way would would not be tied forever > to some wire representation. Great idea; implemented. > . I think there are some important

Re: Do we want a hashset type?

2023-06-18 Thread Andrew Dunstan
On 2023-06-16 Fr 20:38, Joel Jacobson wrote: New patch is attached, which will henceforth always be a complete patch, to avoid the hassle of having to assemble incremental patches. Cool, thanks. A couple of random thoughts: . It might be worth sending a version number with the send

Re: Do we want a hashset type?

2023-06-16 Thread Joel Jacobson
On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote: > I realise int4hashset_hash() is broken, > since two int4hashset's that are considered equal, > can by coincidence get different hashes: ... > Do we have any ideas on how to fix this without sacrificing performance? The problem was due to

Re: Do we want a hashset type?

2023-06-16 Thread Joel Jacobson
On Fri, Jun 16, 2023, at 13:57, jian he wrote: > similar to (int[] || int4) and (int4 || int[]) > should we expect ('{1,2}'::int4hashset || 3) == (3 || > '{1,2}'::int4hashset) == (select hashset_add('{1,2}'::int4hashset,3)); > *?* Good idea, makes sense to support it. Implemented in attached

Re: Do we want a hashset type?

2023-06-16 Thread jian he
similar to (int[] || int4) and (int4 || int[]) should we expect ('{1,2}'::int4hashset || 3) == (3 || '{1,2}'::int4hashset) == (select hashset_add('{1,2}'::int4hashset,3)); *?* The following is the general idea on how to make it work by looking at similar code CREATE OPERATOR || ( leftarg

Re: Do we want a hashset type?

2023-06-16 Thread Joel Jacobson
New patch attached: Add customizable params to int4hashset() and collision count function This commit enhances int4hashset() by introducing adjustable capacity, load, and growth factors, providing flexibility for performance optimization. Also added is a new function, hashset_collisions(), to

Re: Do we want a hashset type?

2023-06-15 Thread Joel Jacobson
On Thu, Jun 15, 2023, at 11:44, jian he wrote: > In hashset/test/sql/order.sql, can we add the following to test whether > the optimizer will use our index. > > CREATE INDEX ON test_int4hashset_order (int4hashset_col > int4hashset_btree_ops); > > -- to make sure that this work with just two

Re: Do we want a hashset type?

2023-06-15 Thread Joel Jacobson
On Thu, Jun 15, 2023, at 06:29, jian he wrote: > I am not sure the following results are correct. > with cte as ( > select hashset(x) as x > ,hashset_capacity(hashset(x)) > ,hashset_count(hashset(x)) > from generate_series(1,10) g(x)) > select * > ,'|' as

Re: Do we want a hashset type?

2023-06-15 Thread Joel Jacobson
On Thu, Jun 15, 2023, at 11:44, jian he wrote: > I didn't install the extension directly. I copied the > hashset--0.0.1.sql to another place, using gcc to compile these > functions. .. > Because even make > PG_CONFIG=/home/jian/postgres/2023_05_25_beta5421/bin/pg_config still > has an error.

Re: Do we want a hashset type?

2023-06-15 Thread jian he
In hashset/test/sql/order.sql, can we add the following to test whether the optimizer will use our index. CREATE INDEX ON test_int4hashset_order (int4hashset_col int4hashset_btree_ops); -- to make sure that this work with just two rows SET enable_seqscan TO off; explain(costs off) SELECT *

Re: Do we want a hashset type?

2023-06-15 Thread Joel Jacobson
On Thu, Jun 15, 2023, at 04:22, jian he wrote: > Attachments: > * temp.patch Thanks for good suggestions. New patch attached: Enhance parsing and reorder headers in hashset module Allow whitespaces in hashset input and reorder the inclusion of header files, placing PostgreSQL headers first.

Re: Do we want a hashset type?

2023-06-14 Thread jian he
On Thu, Jun 15, 2023 at 5:04 AM Joel Jacobson wrote: > On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote: > > On 6/14/23 14:57, Joel Jacobson wrote: > >> Would it be feasible to teach the planner to utilize the internal hash > table of > >> hashset directly? In the case of arrays, the hash

Re: Do we want a hashset type?

2023-06-14 Thread jian he
On Thu, Jun 15, 2023 at 5:04 AM Joel Jacobson wrote: > On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote: > > On 6/14/23 14:57, Joel Jacobson wrote: > >> Would it be feasible to teach the planner to utilize the internal hash > table of > >> hashset directly? In the case of arrays, the hash

Re: Do we want a hashset type?

2023-06-14 Thread Tom Dunstan
On Wed, 14 Jun 2023 at 19:14, Tomas Vondra wrote: > > ...So we'd want the same index usage as > > =ANY(array) but would like faster row checking than we get with an array > > when other indexes are used. > > We kinda already do this since PG14 (commit 50e17ad281), actually. If > the list is long

Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 15:16, Tomas Vondra wrote: > On 6/14/23 14:57, Joel Jacobson wrote: >> Would it be feasible to teach the planner to utilize the internal hash table >> of >> hashset directly? In the case of arrays, the hash table construction is an ... > It's definitely something I'd

Re: Do we want a hashset type?

2023-06-14 Thread Andrew Dunstan
On 2023-06-14 We 05:44, Tomas Vondra wrote: The thing is, adding stuff to core is not free - it means the community becomes responsible for maintenance, testing, fixing issues, etc. It's not feasible (or desirable) to have all extensions in core, and cloud vendors generally do have ways to

Re: Do we want a hashset type?

2023-06-14 Thread Tomas Vondra
On 6/14/23 14:57, Joel Jacobson wrote: > On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote: >>> Perspective from a potential user: I'm currently working on something >>> where an array-like structure with fast membership test performance >>> would be very useful. The main type of query is

Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote: >> Perspective from a potential user: I'm currently working on something >> where an array-like structure with fast membership test performance >> would be very useful. The main type of query is doing an =ANY(the set) >> filter, where the set

Re: Do we want a hashset type?

2023-06-14 Thread Tomas Vondra
On 6/14/23 06:31, Tom Dunstan wrote: > On Mon, 12 Jun 2023 at 22:37, Tomas Vondra > mailto:tomas.von...@enterprisedb.com>> > wrote: > > Perhaps. So you're proposing to have this as a regular built-in type? > It's hard for me to judge how popular this feature would be, but I guess >

Re: Do we want a hashset type?

2023-06-14 Thread Joel Jacobson
On Wed, Jun 14, 2023, at 06:31, Tom Dunstan wrote: > On Mon, 12 Jun 2023 at 22:37, Tomas Vondra > wrote: >> Perhaps. So you're proposing to have this as a regular built-in type? >> It's hard for me to judge how popular this feature would be, but I guess >> people often use arrays while they

Re: Do we want a hashset type?

2023-06-13 Thread Tom Dunstan
On Mon, 12 Jun 2023 at 22:37, Tomas Vondra wrote: > Perhaps. So you're proposing to have this as a regular built-in type? > It's hard for me to judge how popular this feature would be, but I guess > people often use arrays while they actually want set semantics ... > Perspective from a

Re: Do we want a hashset type?

2023-06-13 Thread Joel Jacobson
On Tue, Jun 13, 2023, at 20:50, Joel Jacobson wrote: > hashset is now using hash_bytes_uint32() from hashfn.h I spotted a problem in the ordering logic of the comparison functions. The issue was with handling hashsets containing empty positions, causing non-lexicographic ordering. The updated

Re: Do we want a hashset type?

2023-06-13 Thread Joel Jacobson
On Mon, Jun 12, 2023, at 22:36, Joel Jacobson wrote: > On Mon, Jun 12, 2023, at 21:58, Tomas Vondra wrote: >> My suggestion is to be lazy, just use the lookup3 we have in hashfn.c >> (through hash_bytes or something), and at the same time make it possible >> to switch to a different function in

Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Mon, Jun 12, 2023, at 21:58, Tomas Vondra wrote: > But those are numbers for large keys - if you restrict the input to > 4B-16B (which is what we planned to do by focusing on int, bigint and > uuid), there's no significant difference: Oh, sorry, I completely failed to read the meaning of the

Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra
On 6/12/23 19:34, Joel Jacobson wrote: > On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote: 1) better hash table implementation > > I noticed src/include/common/hashfn.h provides implementation > of the Jenkins/lookup3 hash function, and thought maybe > we could simply use it in hashset?

Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote: >>> 1) better hash table implementation I noticed src/include/common/hashfn.h provides implementation of the Jenkins/lookup3 hash function, and thought maybe we could simply use it in hashset? However, I noticed that according to SMHasher [1],

Re: Do we want a hashset type?

2023-06-12 Thread Joel Jacobson
On Sat, Jun 10, 2023, at 17:46, Andrew Dunstan wrote: > Maybe you can post a full patch as well as incremental? Attached patch is based on tvondra's last commit 375b072. > Stylistically I think you should reduce reliance on magic numbers (like 13). > Probably need some #define's? Great idea,

Re: Do we want a hashset type?

2023-06-12 Thread Andrew Dunstan
On 2023-06-12 Mo 09:00, Tomas Vondra wrote: What was the json/jsonb story, was it ever an extension before being included in core? I don't recall what the exact story was, but I guess the "json" type was added to core very long ago (before we started to push back a bit), and we added some

Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra
On 6/12/23 14:46, Andrew Dunstan wrote: > > On 2023-06-11 Su 16:15, Joel Jacobson wrote: >> On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote: >>> On 6/11/23 12:26, Joel Jacobson wrote: I think there are some good arguments that speaks in favour of including it in core: >> ... >>>

Re: Do we want a hashset type?

2023-06-12 Thread Tomas Vondra
On 6/11/23 22:15, Joel Jacobson wrote: > On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote: >> On 6/11/23 12:26, Joel Jacobson wrote: >>> I think there are some good arguments that speaks in favour of including it >>> in core: > ... >> >> I agree with all of that, but ... >> >> It's just past

Re: Do we want a hashset type?

2023-06-12 Thread Andrew Dunstan
On 2023-06-11 Su 16:15, Joel Jacobson wrote: On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote: On 6/11/23 12:26, Joel Jacobson wrote: I think there are some good arguments that speaks in favour of including it in core: ... I agree with all of that, but ... It's just past feature freeze,

Re: Do we want a hashset type?

2023-06-11 Thread Joel Jacobson
On Sun, Jun 11, 2023, at 17:03, Tomas Vondra wrote: > On 6/11/23 12:26, Joel Jacobson wrote: >> I think there are some good arguments that speaks in favour of including it >> in core: ... > > I agree with all of that, but ... > > It's just past feature freeze, so the earliest release this could

Re: Do we want a hashset type?

2023-06-11 Thread Joel Jacobson
On Sun, Jun 11, 2023, at 16:58, Andrew Dunstan wrote: >>On 2023-06-11 Su 06:26, Joel Jacobson wrote: >>Perhaps "set" would have been a better name, since the use of hash functions >>from an end-user perspective is >>implementation details, but we cannot use >>that word since it's a reserved

Re: Do we want a hashset type?

2023-06-11 Thread Tomas Vondra
On 6/11/23 12:26, Joel Jacobson wrote: > On Sat, Jun 10, 2023, at 22:26, Tomas Vondra wrote: >> On 6/10/23 17:46, Andrew Dunstan wrote: >>> >>> Maybe you can post a full patch as well as incremental? >>> >> >> I wonder if we should keep discussing this extension here, considering >> it's going

Re: Do we want a hashset type?

2023-06-11 Thread Andrew Dunstan
On 2023-06-11 Su 06:26, Joel Jacobson wrote: On Sat, Jun 10, 2023, at 22:26, Tomas Vondra wrote: On 6/10/23 17:46, Andrew Dunstan wrote: Maybe you can post a full patch as well as incremental? I wonder if we should keep discussing this extension here, considering it's going to be out of

Re: Do we want a hashset type?

2023-06-11 Thread Joel Jacobson
On Sat, Jun 10, 2023, at 22:26, Tomas Vondra wrote: > On 6/10/23 17:46, Andrew Dunstan wrote: >> >> Maybe you can post a full patch as well as incremental? >> > > I wonder if we should keep discussing this extension here, considering > it's going to be out of core (at least for now). Not sure

Re: Do we want a hashset type?

2023-06-10 Thread Tomas Vondra
On 6/10/23 17:46, Andrew Dunstan wrote: > > On 2023-06-09 Fr 07:56, Joel Jacobson wrote: >> On Fri, Jun 9, 2023, at 13:33, jian he wrote: >> > Hi, I am quite new about C. >> > The following function I have 3 questions. >> > 1. 7691,4201, I assume they are just random prime ints? >> >> Yes,

Re: Do we want a hashset type?

2023-06-10 Thread Tomas Vondra
On 6/9/23 12:58, Joel Jacobson wrote: > On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote: >> Would you be interested in helping with / working on some of that? I >> don't have immediate need for this stuff, so it's not very high on my >> TODO list. > > Sure, I'm willing to help! > > I've

Re: Do we want a hashset type?

2023-06-10 Thread jian he
in funcion. hashset_in int32 value = strtol(str, , 10); there is no int32 value range check? imitate src/backend/utils/adt/int.c. the following way is what I came up with. int64 value = strtol(str, , 10); if (errno == ERANGE || value < INT_MIN || value > INT_MAX) ereturn(fcinfo->context, (Datum)

Re: Do we want a hashset type?

2023-06-10 Thread Andrew Dunstan
On 2023-06-09 Fr 07:56, Joel Jacobson wrote: On Fri, Jun 9, 2023, at 13:33, jian he wrote: > Hi, I am quite new about C. > The following function I have 3 questions. > 1. 7691,4201, I assume they are just random prime ints? Yes, 7691 and 4201 are likely chosen as random prime numbers. In

Re: Do we want a hashset type?

2023-06-09 Thread Joel Jacobson
On Fri, Jun 9, 2023, at 13:33, jian he wrote: > Hi, I am quite new about C. > The following function I have 3 questions. > 1. 7691,4201, I assume they are just random prime ints? Yes, 7691 and 4201 are likely chosen as random prime numbers. In hash functions, prime numbers are often used to

Re: Do we want a hashset type?

2023-06-09 Thread jian he
On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson wrote: > On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote: > > Would you be interested in helping with / working on some of that? I > > don't have immediate need for this stuff, so it's not very high on my > > TODO list. > > Sure, I'm willing to

Re: Do we want a hashset type?

2023-06-09 Thread Joel Jacobson
On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote: > Would you be interested in helping with / working on some of that? I > don't have immediate need for this stuff, so it's not very high on my > TODO list. Sure, I'm willing to help! I've attached a patch that works on some of the items on your

Re: Do we want a hashset type?

2023-06-08 Thread Tomas Vondra
On 6/8/23 11:41, Joel Jacobson wrote: > On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote: >> Interesting, considering how dumb the the hash table implementation is. > > That's promising. > Yeah, not bad for sleep-deprived on-plane hacking ... There's a bunch of stuff that needs to be improved

Re: Do we want a hashset type?

2023-06-08 Thread Joel Jacobson
On Wed, Jun 7, 2023, at 19:37, Tomas Vondra wrote: > Interesting, considering how dumb the the hash table implementation is. That's promising. >> I tested Neo4j and the results are surprising; it appears to be >> significantly *slower*. >> However, I've probably misunderstood something, maybe I

Re: Do we want a hashset type?

2023-06-07 Thread Tomas Vondra
On 6/7/23 16:21, Joel Jacobson wrote: > On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote: >> it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms >> on your system. There's a bunch of opportunities for more improvements, >> as the hash table implementation is pretty

Re: Do we want a hashset type?

2023-06-07 Thread Joel Jacobson
On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote: > it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms > on your system. There's a bunch of opportunities for more improvements, > as the hash table implementation is pretty naive/silly, the on-disk > format is wasteful and so

Re: Do we want a hashset type?

2023-06-06 Thread Tomas Vondra
On 6/5/23 21:52, Joel Jacobson wrote: > On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote: >> Anyway, I hacked together a trivial set backed by an open addressing >> hash table: >> >> https://github.com/tvondra/hashset >> >> It's super-basic, providing just some bare minimum of functions, but

Re: Do we want a hashset type?

2023-06-05 Thread Joel Jacobson
On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote: > Anyway, I hacked together a trivial set backed by an open addressing > hash table: > > https://github.com/tvondra/hashset > > It's super-basic, providing just some bare minimum of functions, but > hopefully good enough for experiments. > > -

Re: Do we want a hashset type?

2023-06-05 Thread Joel Jacobson
On Fri, Jun 2, 2023, at 10:01, Ants Aasma wrote: > Have you looked at roaring bitmaps? There is a pg_roaringbitmap > extension [1] already available that offers very fast unions, > intersections and membership tests over integer sets. I used it to get > some pretty impressive performance results

Re: Do we want a hashset type?

2023-06-04 Thread Tomas Vondra
On 6/1/23 09:14, Joel Jacobson wrote: > On Thu, Jun 1, 2023, at 09:02, Joel Jacobson wrote: >> Here is an example using a real anonymised social network. > > I realised the "found" column is not necessary in this particular example, > since we only care about the friends at the exact depth

Re: Do we want a hashset type?

2023-06-02 Thread Ants Aasma
On Wed, 31 May 2023 at 18:40, Joel Jacobson wrote: > > On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote: > > I think this needs a better explanation - what exactly is a hashset in > > this context? Something like an array with a hash for faster lookup of > > unique elements, or what? > > In

  1   2   >