Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-12-25 Thread Joel Jacobson
be better to leave that off. But > it deserves explicit consideration. I intentionally made it a close analog of regexp_matches(), to make it easy for existing users of regexp_matches() to understand how regexp_positions() works. > 2. It looks like you modeled this on regexp_matches() to the

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-12-03 Thread Daniel Gustafsson
This review has gone unanswered for two months, so I'm marking this patch Returned with Feedback. Please feel free to resubmit when a new version of the patch is available. -- Daniel Gustafsson https://vmware.com/

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-21 Thread Tom Lane
se. I think if we're going to do this, we should learn from history, and provide an analog to regexp_match() as well as regexp_matches() right off the bat. 3. The API convention you chose (separate start and length arrays) is perhaps still confusing. When I first looked at the test case +SE

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-12 Thread Joel Jacobson
he problem due to the recent change in setup_regexp_matches(), which added a new int parameter "start_search". I pass 0 as start_search, which I think should give the same behaviour as before. I also changed the assigned oid values in pg_proc.dat for the two new regexp_positions() ca

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-01 Thread Daniel Gustafsson
> On 1 Sep 2021, at 16:02, Tom Lane wrote: > > Daniel Gustafsson writes: >>> On 9 Mar 2021, at 20:30, Joel Jacobson wrote: >>> Attached is a patch implementing it this way. > >> This patch no longer applies, can you please submit a rebased version? On a brief skim, this patch includes the doc

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-01 Thread Tom Lane
Daniel Gustafsson writes: >> On 9 Mar 2021, at 20:30, Joel Jacobson wrote: >> Attached is a patch implementing it this way. > This patch no longer applies, can you please submit a rebased version? Also, since 642433707 ("This patch adds new functions regexp_count(), regexp_instr(), regexp_like(

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-09-01 Thread Daniel Gustafsson
> On 9 Mar 2021, at 20:30, Joel Jacobson wrote: > Attached is a patch implementing it this way. This patch no longer applies, can you please submit a rebased version? -- Daniel Gustafsson https://vmware.com/

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
to iterate through parallel arrays. There are ways to do > that in SQL, certainly, but they all constrain how you write the query, > and usually force ugly stuff like splitting it into sub-selects. I see now what you mean, many thanks for explaining. > > As an example, presuming that

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Tom Lane
ly, but they all constrain how you write the query, and usually force ugly stuff like splitting it into sub-selects. As an example, presuming that regexp_positions is defined along the lines of regexp_positions(str text, pat text, out starts int[], out lengths int[]) returns setof record then to

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Tom Lane
"Joel Jacobson" writes: > On Tue, Mar 9, 2021, at 10:18, Pavel Stehule wrote: >> you can do unnest(array1, array2, ...) > Right, I had forgotten about that variant. > But isn't this a bit surprising then: > ... > Should there be an entry there showing the VARIADIC anyelement version as > well?

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
On Tue, Mar 9, 2021, at 13:16, Pavel Stehule wrote: > út 9. 3. 2021 v 11:32 odesílatel Joel Jacobson napsal: >> __On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote: >>> My experience with working with parallel arrays in SQL has been unpleasant. >> >> Could you please give an example on such an unplea

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Pavel Stehule
plex application with 3D data of some points in 2D array. Everywhere was a[d, 0], a[d, 1], a[d, 2], instead a[d] or instead a[d].x, ... > I can see a problem if the arrays could possibly have difference > dimensionality/cardinality, > but regexp_positions() could guarantee they won'

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
but regexp_positions() could guarantee they won't, so I don't see a problem here, but there is probably something I'm missing here? /Joel

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
On Tue, Mar 9, 2021, at 10:18, Pavel Stehule wrote: > What do you mean? >> More than one unnest() in the same query, e.g. SELECT unnest(..), unnest(..)? > > you can do unnest(array1, array2, ...) Right, I had forgotten about that variant. But isn't this a bit surprising then: \df unnest

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Pavel Stehule
út 9. 3. 2021 v 10:01 odesílatel Joel Jacobson napsal: > On Tue, Mar 9, 2021, at 09:29, Pavel Stehule wrote: > > > > https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY > > but the SLICE argument should be constant. But this limit is artificial, > just fo

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
On Tue, Mar 9, 2021, at 09:29, Pavel Stehule wrote: > > https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY > > but the SLICE argument should be constant. But this limit is artificial, just > for implementation simplicity. Important is behaviour. I see

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Pavel Stehule
e preferred. > But this doesn't change the fact you sometimes have to deal with 2D > arrays, in which case the proposed unnest(anyarray,int) would improve the > user-experience a lot, when wanting to unnest just one level (or N levels). > > Sounds like you are suggesting some oth

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-09 Thread Joel Jacobson
proposed unnest(anyarray,int) would improve the user-experience a lot, when wanting to unnest just one level (or N levels). Sounds like you are suggesting some other improvements, in addition to the proposed unnest(anyarray,int)? Correct? A regexp_positions() returning setof 2-D array[] would not

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Pavel Stehule
út 9. 3. 2021 v 7:57 odesílatel Joel Jacobson napsal: > On Mon, Mar 8, 2021, at 21:46, Pavel Stehule wrote: > > so what about? > > CREATE OR REPLACE FUNCTION unnest_slice(anyarray, int) > RETURNS SETOF anyarray AS $$ > DECLARE r $1%type; > BEGIN > FOREACH r SLICE $2 IN ARRAY $1 --- now $2 shoul

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Joel Jacobson
On Mon, Mar 8, 2021, at 21:46, Pavel Stehule wrote: > so what about? > > CREATE OR REPLACE FUNCTION unnest_slice(anyarray, int) > RETURNS SETOF anyarray AS $$ > DECLARE r $1%type; > BEGIN > FOREACH r SLICE $2 IN ARRAY $1 --- now $2 should be constant > LOOP > RETURN NEXT r; > END LOOP; >

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Pavel Stehule
po 8. 3. 2021 v 21:12 odesílatel Tom Lane napsal: > "Joel Jacobson" writes: > > If I understand it correctly, we don't need to run genbki.pl to compile > PostgreSQL, > > so someone wanting to compile PostgreSQL without having a running > PostgreSQL-instance > > could do so without problems. > >

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Tom Lane
"Joel Jacobson" writes: > If I understand it correctly, we don't need to run genbki.pl to compile > PostgreSQL, > so someone wanting to compile PostgreSQL without having a running > PostgreSQL-instance > could do so without problems. > A dependency on having a PostgreSQL instance running, > is p

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Joel Jacobson
On Mon, Mar 8, 2021, at 19:46, Joel Jacobson wrote: > However, for certain tasks, when a high-level language is preferred, > and when the raw performance of C isn't necessary, then maybe SQL/PLpgSQL > could be a serious alternative to Perl? Before we had jsonb, this would have been totally unreal

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Joel Jacobson
On Mon, Mar 8, 2021, at 18:30, Tom Lane wrote: > FWIW, I personally think that returning a start position and a length > would be the most understandable way to operate. Very good point. I agree. (And then ranges cannot be used, regardless of canonical form.) > Yeah: it's hard. The amount of c

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger
;> > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and >> > not controversial. >> >> How about proposing some array functions to go along with the >> regexp_positions, and then do it that way? > > Sounds like a nice solution. That woul

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Chapman Flack
On 03/08/21 13:29, Chapman Flack wrote: > I think the s-free version is exactly the regexp_instr included in > the other concurrent proposal [1] sorry. [1] https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Chapman Flack
to the ISO position_regex() except for the ISO one using XQuery regex syntax. I gather from [1] that the name regexp_instr is chosen in solidarity with other DBMSs that de facto have it. Would it be weirder to have the singular form be regexp_instr and the plural be regexp_positions? Or to diver

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Tom Lane
need for that, since a function can be declared to return an anonymous record type by giving it some OUT parameters. However, if I'm understanding things correctly "regexp_positions(IN ..., OUT match_start integer, OUT match_length integer) RETURNS SETOF record" wouldn't be enou

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Joel Jacobson
On Mon, Mar 8, 2021, at 18:11, Mark Dilger wrote: > > On Mar 8, 2021, at 9:05 AM, Joel Jacobson wrote: > > > > If a N+1 dimension array could easily be unnested to a N dimension array, > > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and &g

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger
> On Mar 8, 2021, at 9:05 AM, Joel Jacobson wrote: > > If a N+1 dimension array could easily be unnested to a N dimension array, > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and > not controversial. How about proposing some array functions

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Joel Jacobson
On Mon, Mar 8, 2021, at 17:20, Mark Dilger wrote: > > On Mar 5, 2021, at 11:46 AM, Joel Jacobson wrote: > > <0003-regexp-positions.patch> > > I did a bit more testing: > > +SELECT regexp_positions('foobarbequebaz', 'b', 'g'); > + r

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-08 Thread Mark Dilger
> On Mar 5, 2021, at 11:46 AM, Joel Jacobson wrote: > > > /Joel > <0003-regexp-positions.patch> I did a bit more testing: +SELECT regexp_positions('foobarbequebaz', 'b', 'g'); + regexp_positions +-- + {"[3,5)"} +

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson
On Fri, Mar 5, 2021, at 20:46, Joel Jacobson wrote: > My conclusion is that we should use setof int4range[] as the return value for > regexp_positions(). If acceptable by the project, it be even nicer if we could just return the suggested composite type. I don't see any exist

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson
On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote: > I like the idea so I did a bit of testing. I think the following should not > error, but does: > > +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); > +ERROR: range lower bound must be less

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Andreas Karlsson
On 3/4/21 4:40 PM, Tom Lane wrote: I wonder if a 2-D integer array wouldn't be a better idea, ie {{startpos1,length1},{startpos2,length2},...}. My experience with working with parallel arrays in SQL has been unpleasant. Hm, I can see your point but on the other hand I can't say my experiences

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Pavel Stehule
Hi pá 5. 3. 2021 v 13:44 odesílatel Joel Jacobson napsal: > Idea #5: > > Allow disabling canonicalization via optional parameter to range > constructor functions. > I think so rules describing ranges and multirages are long enough, so increasing functionality doesn't look like a practical idea.

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson
Idea #5: Allow disabling canonicalization via optional parameter to range constructor functions. This would then allow using the range type, to create inclusive/inclusive integer ranges, where lower() and upper() would return what you expect. /Joel

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-05 Thread Joel Jacobson
On Thu, Mar 4, 2021, at 17:55, Gilles Darold wrote: > I also think that it should return a setof 2-D integer array, an other > solution is to return all start/end positions of an occurrence chained > in an integer array {start1,end1,start2,end2,..}. Hmm. Seems like we've in total managed to come

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Gilles Darold
Le 04/03/2021 à 16:40, Tom Lane a écrit : "Joel Jacobson" writes: Having abandoned the cute idea that didn't work, here comes a new patch with a regexp_positions() instead returning setof record (start_pos integer[], end_pos integer[]). I wonder if a 2-D integer array would

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Joel Jacobson
On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote: > "Joel Jacobson" writes: > > Having abandoned the cute idea that didn't work, > > here comes a new patch with a regexp_positions() instead returning > > setof record (start_pos integer[], end_pos integer[]).

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Chapman Flack
gex regexp_replace translate_regex The regexp_positions proposal highlights an interesting apparent gap in both the de facto and the ISO specs: the provided functions allow you to specify which occurrence you're talking about, and get the corresponding positions or the cor

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Tom Lane
"Joel Jacobson" writes: > Having abandoned the cute idea that didn't work, > here comes a new patch with a regexp_positions() instead returning > setof record (start_pos integer[], end_pos integer[]). I wonder if a 2-D integer array wouldn't be a better idea, ie {{

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Joel Jacobson
On Tue, Mar 2, 2021, at 06:31, Tom Lane wrote: > "Joel Jacobson" writes: > > Unless fixed, then the way I see it, I don't think we can use int4range[] > > for regexp_positions(), > > Yeah. It's a cute idea, but the semantics aren't quite right. H

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Mark Dilger
> On Mar 2, 2021, at 5:34 AM, Isaac Morland wrote: > > Returning to the RE result issue, I wonder how much it actually matters where > any empty matches are. Certainly the actual contents of the match don’t > matter; you don’t need to be able to index into the string to extract the > substr

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Joel Jacobson
se, but I think I would still prefer a complete solution by just returning setof two integer[] values, instead of the cuter-but-only-partial solution of using the existing int4range[]. Even better would be if we could fix the range type so it could actually be used in this and other similar situat

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Isaac Morland
On Tue, 2 Mar 2021 at 08:58, Joel Jacobson wrote: > If we stipulate that the result array will be in order, then you still > don’t have the exact location of empty matches but you do at least have > where they are relative to non-empty matches. > > > This part I didn't fully understand. Can you p

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Joel Jacobson
Hi Isaac, Many thanks for the comments. On Tue, Mar 2, 2021, at 14:34, Isaac Morland wrote: > One question I would have is whether empty ranges are all equal to each > other. If they are, you have an equality that isn’t really equality; if they > aren’t then you would have ranges that are unequ

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Isaac Morland
On Tue, 2 Mar 2021 at 00:52, Joel Jacobson wrote: > Ranges are treated as sets. As such equality is defined by membership. > > That being said, I agree that there may be situations in which it would be > convenient to have empty ranges at specific locations. Doing this would > introduce numerous

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-02 Thread Joel Jacobson
On Tue, Mar 2, 2021, at 06:31, Tom Lane wrote: >"Joel Jacobson" writes: >> Unless fixed, then the way I see it, I don't think we can use int4range[] >> for regexp_positions(), > >Yeah. It's a cute idea, but the semantics aren't quite right. I

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Joel Jacobson
range; >> ?column? >> -- >> t >> (1 row) >> >> This seems like a bug to me. What am I missing here? >> >> Unless fixed, then the way I see it, I don't think we can use int4range[] >> for regexp_positions(), >> if we want to all

Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Tom Lane
"Joel Jacobson" writes: > Unless fixed, then the way I see it, I don't think we can use int4range[] for > regexp_positions(), Yeah. It's a cute idea, but the semantics aren't quite right. regards, tom lane

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Isaac Morland
his seems like a bug to me. What am I missing here? > > Unless fixed, then the way I see it, I don't think we can use int4range[] > for regexp_positions(), > if we want to allow returning the positions for zero-length matches, which > would be nice. > Ranges are treated as sets.

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Joel Jacobson
On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote: > I like the idea so I did a bit of testing. I think the following should not > error, but does: > > +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); > +ERROR: range lower bound must be less

Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Mark Dilger
so I did a bit of testing. I think the following should not error, but does: +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); +ERROR: range lower bound must be less than or equal to range upper bound — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

[PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-01 Thread Joel Jacobson
Hi, I suggest adding a new function, regexp_positions(), which works exactly like regexp_matches(), except it returns int4range[] start/end positions for *where* the matches occurs. I first thought I could live without this function, and just get the positions using strpos(), but as Andreas

Re: regexp_positions()

2021-02-28 Thread Joel Jacobson
t;+" for addition, "-" for deletion. Example: =-===+=-+== We could then find the hunks using this regex: (={0,3}[-+]+={0,3})+ using regexp_positions() to find the start and end positions for each hunk: SELECT * FROM regexp_positions('=-===+

Re: regexp_positions()

2021-02-27 Thread Joel Jacobson
Hi, On Sun, Feb 28, 2021, at 03:13, David Fetter wrote: > Maybe an int4multirange, which would fit unless I'm misunderstanding > g's meaning with respect to non-overlapping patterns, but that might > be a little too cute and not easy ever to extend. > > Come to that, would a row structure that lo

Re: regexp_positions()

2021-02-27 Thread David Fetter
On Sat, Feb 27, 2021 at 08:51:27PM +0100, Joel Jacobson wrote: > Hi, > > Finding all matches in a string is convenient using regexp_matches() with the > 'g' flag. > > But if instead wanting to know the start and end positions of the occurrences, > one would have to first call regexp_matches(...,

regexp_positions()

2021-02-27 Thread Joel Jacobson
Hi, Finding all matches in a string is convenient using regexp_matches() with the 'g' flag. But if instead wanting to know the start and end positions of the occurrences, one would have to first call regexp_matches(...,'g') to get all matches, and then iterate through the results and search usin