be better to leave that off. But
> it deserves explicit consideration.
I intentionally made it a close analog of regexp_matches(),
to make it easy for existing users of regexp_matches() to
understand how regexp_positions() works.
> 2. It looks like you modeled this on regexp_matches() to the
This review has gone unanswered for two months, so I'm marking this patch
Returned with Feedback. Please feel free to resubmit when a new version of the
patch is available.
--
Daniel Gustafsson https://vmware.com/
se. I think
if we're going to do this, we should learn from history, and provide an
analog to regexp_match() as well as regexp_matches() right off the bat.
3. The API convention you chose (separate start and length arrays)
is perhaps still confusing. When I first looked at the test case
+SE
he problem due to the recent change in setup_regexp_matches(),
which added a new int parameter "start_search".
I pass 0 as start_search, which I think should give the same behaviour as
before.
I also changed the assigned oid values in pg_proc.dat for the two new
regexp_positions() ca
> On 1 Sep 2021, at 16:02, Tom Lane wrote:
>
> Daniel Gustafsson writes:
>>> On 9 Mar 2021, at 20:30, Joel Jacobson wrote:
>>> Attached is a patch implementing it this way.
>
>> This patch no longer applies, can you please submit a rebased version?
On a brief skim, this patch includes the doc
Daniel Gustafsson writes:
>> On 9 Mar 2021, at 20:30, Joel Jacobson wrote:
>> Attached is a patch implementing it this way.
> This patch no longer applies, can you please submit a rebased version?
Also, since 642433707 ("This patch adds new functions regexp_count(),
regexp_instr(), regexp_like(
> On 9 Mar 2021, at 20:30, Joel Jacobson wrote:
> Attached is a patch implementing it this way.
This patch no longer applies, can you please submit a rebased version?
--
Daniel Gustafsson https://vmware.com/
to iterate through parallel arrays. There are ways to do
> that in SQL, certainly, but they all constrain how you write the query,
> and usually force ugly stuff like splitting it into sub-selects.
I see now what you mean, many thanks for explaining.
>
> As an example, presuming that
ly, but they all constrain how you write the query,
and usually force ugly stuff like splitting it into sub-selects.
As an example, presuming that regexp_positions is defined along the
lines of
regexp_positions(str text, pat text, out starts int[], out lengths int[])
returns setof record
then to
"Joel Jacobson" writes:
> On Tue, Mar 9, 2021, at 10:18, Pavel Stehule wrote:
>> you can do unnest(array1, array2, ...)
> Right, I had forgotten about that variant.
> But isn't this a bit surprising then:
> ...
> Should there be an entry there showing the VARIADIC anyelement version as
> well?
On Tue, Mar 9, 2021, at 13:16, Pavel Stehule wrote:
> út 9. 3. 2021 v 11:32 odesílatel Joel Jacobson napsal:
>> __On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote:
>>> My experience with working with parallel arrays in SQL has been unpleasant.
>>
>> Could you please give an example on such an unplea
plex application with 3D data of some points in 2D array.
Everywhere was a[d, 0], a[d, 1], a[d, 2], instead a[d] or instead a[d].x,
...
> I can see a problem if the arrays could possibly have difference
> dimensionality/cardinality,
> but regexp_positions() could guarantee they won'
but regexp_positions() could guarantee they won't, so I don't see a problem
here,
but there is probably something I'm missing here?
/Joel
On Tue, Mar 9, 2021, at 10:18, Pavel Stehule wrote:
> What do you mean?
>> More than one unnest() in the same query, e.g. SELECT unnest(..), unnest(..)?
>
> you can do unnest(array1, array2, ...)
Right, I had forgotten about that variant.
But isn't this a bit surprising then:
\df unnest
út 9. 3. 2021 v 10:01 odesílatel Joel Jacobson napsal:
> On Tue, Mar 9, 2021, at 09:29, Pavel Stehule wrote:
>
>
>
> https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY
>
> but the SLICE argument should be constant. But this limit is artificial,
> just fo
On Tue, Mar 9, 2021, at 09:29, Pavel Stehule wrote:
>
> https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY
>
> but the SLICE argument should be constant. But this limit is artificial, just
> for implementation simplicity. Important is behaviour.
I see
e preferred.
> But this doesn't change the fact you sometimes have to deal with 2D
> arrays, in which case the proposed unnest(anyarray,int) would improve the
> user-experience a lot, when wanting to unnest just one level (or N levels).
>
> Sounds like you are suggesting some oth
proposed unnest(anyarray,int) would improve the user-experience
a lot, when wanting to unnest just one level (or N levels).
Sounds like you are suggesting some other improvements, in addition to the
proposed unnest(anyarray,int)? Correct?
A regexp_positions() returning setof 2-D array[] would not
út 9. 3. 2021 v 7:57 odesílatel Joel Jacobson napsal:
> On Mon, Mar 8, 2021, at 21:46, Pavel Stehule wrote:
>
> so what about?
>
> CREATE OR REPLACE FUNCTION unnest_slice(anyarray, int)
> RETURNS SETOF anyarray AS $$
> DECLARE r $1%type;
> BEGIN
> FOREACH r SLICE $2 IN ARRAY $1 --- now $2 shoul
On Mon, Mar 8, 2021, at 21:46, Pavel Stehule wrote:
> so what about?
>
> CREATE OR REPLACE FUNCTION unnest_slice(anyarray, int)
> RETURNS SETOF anyarray AS $$
> DECLARE r $1%type;
> BEGIN
> FOREACH r SLICE $2 IN ARRAY $1 --- now $2 should be constant
> LOOP
> RETURN NEXT r;
> END LOOP;
>
po 8. 3. 2021 v 21:12 odesílatel Tom Lane napsal:
> "Joel Jacobson" writes:
> > If I understand it correctly, we don't need to run genbki.pl to compile
> PostgreSQL,
> > so someone wanting to compile PostgreSQL without having a running
> PostgreSQL-instance
> > could do so without problems.
> >
"Joel Jacobson" writes:
> If I understand it correctly, we don't need to run genbki.pl to compile
> PostgreSQL,
> so someone wanting to compile PostgreSQL without having a running
> PostgreSQL-instance
> could do so without problems.
> A dependency on having a PostgreSQL instance running,
> is p
On Mon, Mar 8, 2021, at 19:46, Joel Jacobson wrote:
> However, for certain tasks, when a high-level language is preferred,
> and when the raw performance of C isn't necessary, then maybe SQL/PLpgSQL
> could be a serious alternative to Perl?
Before we had jsonb, this would have been totally unreal
On Mon, Mar 8, 2021, at 18:30, Tom Lane wrote:
> FWIW, I personally think that returning a start position and a length
> would be the most understandable way to operate.
Very good point. I agree. (And then ranges cannot be used, regardless of
canonical form.)
> Yeah: it's hard. The amount of c
;> > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and
>> > not controversial.
>>
>> How about proposing some array functions to go along with the
>> regexp_positions, and then do it that way?
>
> Sounds like a nice solution. That woul
On 03/08/21 13:29, Chapman Flack wrote:
> I think the s-free version is exactly the regexp_instr included in
> the other concurrent proposal [1]
sorry.
[1]
https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net
to the
ISO position_regex() except for the ISO one using XQuery regex syntax.
I gather from [1] that the name regexp_instr is chosen in solidarity
with other DBMSs that de facto have it. Would it be weirder to have the
singular form be regexp_instr and the plural be regexp_positions?
Or to diver
need for that,
since a function can be declared to return an anonymous record type
by giving it some OUT parameters. However, if I'm understanding
things correctly "regexp_positions(IN ..., OUT match_start integer,
OUT match_length integer) RETURNS SETOF record" wouldn't be enou
On Mon, Mar 8, 2021, at 18:11, Mark Dilger wrote:
> > On Mar 8, 2021, at 9:05 AM, Joel Jacobson wrote:
> >
> > If a N+1 dimension array could easily be unnested to a N dimension array,
> > I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and
&g
> On Mar 8, 2021, at 9:05 AM, Joel Jacobson wrote:
>
> If a N+1 dimension array could easily be unnested to a N dimension array,
> I would prefer Tom's idea of a 2-D regexp_positions(), since it simple and
> not controversial.
How about proposing some array functions
On Mon, Mar 8, 2021, at 17:20, Mark Dilger wrote:
> > On Mar 5, 2021, at 11:46 AM, Joel Jacobson wrote:
> > <0003-regexp-positions.patch>
>
> I did a bit more testing:
>
> +SELECT regexp_positions('foobarbequebaz', 'b', 'g');
> + r
> On Mar 5, 2021, at 11:46 AM, Joel Jacobson wrote:
>
>
> /Joel
> <0003-regexp-positions.patch>
I did a bit more testing:
+SELECT regexp_positions('foobarbequebaz', 'b', 'g');
+ regexp_positions
+--
+ {"[3,5)"}
+
On Fri, Mar 5, 2021, at 20:46, Joel Jacobson wrote:
> My conclusion is that we should use setof int4range[] as the return value for
> regexp_positions().
If acceptable by the project, it be even nicer if we could just return the
suggested composite type.
I don't see any exist
On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing. I think the following should not
> error, but does:
>
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR: range lower bound must be less
On 3/4/21 4:40 PM, Tom Lane wrote:
I wonder if a 2-D integer array wouldn't be a better idea,
ie {{startpos1,length1},{startpos2,length2},...}. My experience
with working with parallel arrays in SQL has been unpleasant.
Hm, I can see your point but on the other hand I can't say my
experiences
Hi
pá 5. 3. 2021 v 13:44 odesílatel Joel Jacobson napsal:
> Idea #5:
>
> Allow disabling canonicalization via optional parameter to range
> constructor functions.
>
I think so rules describing ranges and multirages are long enough, so
increasing functionality doesn't look like a practical idea.
Idea #5:
Allow disabling canonicalization via optional parameter to range constructor
functions.
This would then allow using the range type,
to create inclusive/inclusive integer ranges,
where lower() and upper() would return what you expect.
/Joel
On Thu, Mar 4, 2021, at 17:55, Gilles Darold wrote:
> I also think that it should return a setof 2-D integer array, an other
> solution is to return all start/end positions of an occurrence chained
> in an integer array {start1,end1,start2,end2,..}.
Hmm. Seems like we've in total managed to come
Le 04/03/2021 à 16:40, Tom Lane a écrit :
"Joel Jacobson" writes:
Having abandoned the cute idea that didn't work,
here comes a new patch with a regexp_positions() instead returning
setof record (start_pos integer[], end_pos integer[]).
I wonder if a 2-D integer array would
On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote:
> "Joel Jacobson" writes:
> > Having abandoned the cute idea that didn't work,
> > here comes a new patch with a regexp_positions() instead returning
> > setof record (start_pos integer[], end_pos integer[]).
gex
regexp_replace translate_regex
The regexp_positions proposal highlights an interesting apparent gap in
both the de facto and the ISO specs: the provided functions allow you
to specify which occurrence you're talking about, and get the corresponding
positions or the cor
"Joel Jacobson" writes:
> Having abandoned the cute idea that didn't work,
> here comes a new patch with a regexp_positions() instead returning
> setof record (start_pos integer[], end_pos integer[]).
I wonder if a 2-D integer array wouldn't be a better idea,
ie {{
On Tue, Mar 2, 2021, at 06:31, Tom Lane wrote:
> "Joel Jacobson" writes:
> > Unless fixed, then the way I see it, I don't think we can use int4range[]
> > for regexp_positions(),
>
> Yeah. It's a cute idea, but the semantics aren't quite right.
H
> On Mar 2, 2021, at 5:34 AM, Isaac Morland wrote:
>
> Returning to the RE result issue, I wonder how much it actually matters where
> any empty matches are. Certainly the actual contents of the match don’t
> matter; you don’t need to be able to index into the string to extract the
> substr
se,
but I think I would still prefer a complete solution by just returning setof
two integer[] values,
instead of the cuter-but-only-partial solution of using the existing
int4range[].
Even better would be if we could fix the range type so it could actually be
used in this and other similar situat
On Tue, 2 Mar 2021 at 08:58, Joel Jacobson wrote:
> If we stipulate that the result array will be in order, then you still
> don’t have the exact location of empty matches but you do at least have
> where they are relative to non-empty matches.
>
>
> This part I didn't fully understand. Can you p
Hi Isaac,
Many thanks for the comments.
On Tue, Mar 2, 2021, at 14:34, Isaac Morland wrote:
> One question I would have is whether empty ranges are all equal to each
> other. If they are, you have an equality that isn’t really equality; if they
> aren’t then you would have ranges that are unequ
On Tue, 2 Mar 2021 at 00:52, Joel Jacobson wrote:
> Ranges are treated as sets. As such equality is defined by membership.
>
> That being said, I agree that there may be situations in which it would be
> convenient to have empty ranges at specific locations. Doing this would
> introduce numerous
On Tue, Mar 2, 2021, at 06:31, Tom Lane wrote:
>"Joel Jacobson" writes:
>> Unless fixed, then the way I see it, I don't think we can use int4range[]
>> for regexp_positions(),
>
>Yeah. It's a cute idea, but the semantics aren't quite right.
I
range;
>> ?column?
>> --
>> t
>> (1 row)
>>
>> This seems like a bug to me. What am I missing here?
>>
>> Unless fixed, then the way I see it, I don't think we can use int4range[]
>> for regexp_positions(),
>> if we want to all
"Joel Jacobson" writes:
> Unless fixed, then the way I see it, I don't think we can use int4range[] for
> regexp_positions(),
Yeah. It's a cute idea, but the semantics aren't quite right.
regards, tom lane
his seems like a bug to me. What am I missing here?
>
> Unless fixed, then the way I see it, I don't think we can use int4range[]
> for regexp_positions(),
> if we want to allow returning the positions for zero-length matches, which
> would be nice.
>
Ranges are treated as sets.
On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing. I think the following should not
> error, but does:
>
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR: range lower bound must be less
so I did a bit of testing. I think the following should not
error, but does:
+SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
+ERROR: range lower bound must be less than or equal to range upper bound
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
I suggest adding a new function, regexp_positions(),
which works exactly like regexp_matches(),
except it returns int4range[] start/end positions for *where* the matches
occurs.
I first thought I could live without this function,
and just get the positions using strpos(),
but as Andreas
t;+" for addition, "-" for deletion.
Example: =-===+=-+==
We could then find the hunks using this regex:
(={0,3}[-+]+={0,3})+
using regexp_positions() to find the start and end positions for each hunk:
SELECT * FROM
regexp_positions('=-===+
Hi,
On Sun, Feb 28, 2021, at 03:13, David Fetter wrote:
> Maybe an int4multirange, which would fit unless I'm misunderstanding
> g's meaning with respect to non-overlapping patterns, but that might
> be a little too cute and not easy ever to extend.
>
> Come to that, would a row structure that lo
On Sat, Feb 27, 2021 at 08:51:27PM +0100, Joel Jacobson wrote:
> Hi,
>
> Finding all matches in a string is convenient using regexp_matches() with the
> 'g' flag.
>
> But if instead wanting to know the start and end positions of the occurrences,
> one would have to first call regexp_matches(...,
Hi,
Finding all matches in a string is convenient using regexp_matches() with the
'g' flag.
But if instead wanting to know the start and end positions of the occurrences,
one would have to first call regexp_matches(...,'g') to get all matches,
and then iterate through the results and search usin
59 matches
Mail list logo