On Sun, Aug 15, 2021 at 11:34:22PM +0200, Martijn van Duren wrote:
> Andreas Kähäri gave a nice example on misc@ on how our sed addressing
> implemenation differs from gsed[0][1]. While writing my reply I noticed
> that POSIX doesn't state how "next cycle" should be interpreted when it
> comes to address ranges. So I can't state that our implementation is
> wrong per se. However, I do think that gsed's interpretation is more
> intuitive, since a numeric address is not dependent on the context of
> the pattern space and thus should register as "in range".
But note that this comes out of a discussion on how to do '0,/re/'
addressing with OpenBSD sed. Your changes appears to remove one way of
actually handling a match of '/re/' on the first line without giving us
another. It would be better to have a clean way of doing the equivalent
of '0,/re/' than to remove a way to do this. Interestingly (?), the sed
in plan9port works the same as our native sed.
Andreas
>
> Diff below changes program parsing to more closely match gsed in this
> regard:
> $ printf 'test1\nbla1\ntest2\nbla2\n' | sed -e '1 { /^test/d; }' -e
> '1,/^test/d'
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }' -e
> '1,/^test/d'
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | ./obj/sed -e '1 { /^test/d; }'
> -e '1,/^test/d'
> bla1
> test2
> bla2
> $ printf 'test1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e
> '1,/^test/d'
> bla2
> $ printf 'bla0\ntest1\nbla1\ntest2\nbla2\n' | gsed -e '1 { /^test/d; }' -e
> '1,/^test/d'
> bla1
> test2
> bla2
>
> The diff passes regress, but hasn't had a lot of scrutiny. Just checking
> for general interest in changing this functionality. As soon as I
> know that it's something we might want I'll spend more braincycles on
> it.
>
> martijn@
>
> [0] https://marc.info/?l=openbsd-misc&m=162896537001890&w=2
> [1] https://marc.info/?l=openbsd-misc&m=162905748428954&w=2
>
> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.34
> diff -u -p -r1.34 process.c
> --- process.c 14 Nov 2018 10:59:33 -0000 1.34
> +++ process.c 15 Aug 2021 21:30:22 -0000
> @@ -89,14 +89,16 @@ process(void)
> SPACE tspace;
> size_t len, oldpsl;
> char *p;
> + int nextcycle;
>
> for (linenum = 0; mf_fgets(&PS, REPLACE);) {
> pd = 0;
> + nextcycle = 0;
> top:
> cp = prog;
> redirect:
> while (cp != NULL) {
> - if (!applies(cp)) {
> + if (!applies(cp) || nextcycle) {
> cp = cp->next;
> continue;
> }
> @@ -127,14 +129,16 @@ redirect:
> break;
> case 'd':
> pd = 1;
> - goto new;
> + nextcycle = 1;
> + break;
> case 'D':
> if (pd)
> goto new;
> if (psl == 0 ||
> (p = memchr(ps, '\n', psl)) == NULL) {
> pd = 1;
> - goto new;
> + nextcycle = 1;
> + break;
> } else {
> psl -= (p + 1) - ps;
> memmove(ps, p + 1, psl);
> @@ -267,8 +271,9 @@ new: if (!nflag && !pd)
> * (lastline, linenumber, ps).
> */
> #define MATCH(a) \
> - (a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, 0, psl) : \
> - (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()
> + (a)->type == AT_LINE ? linenum == (a)->u.l : \
> + (a)->type == AT_LAST ? lastline() : \
> + pd ? 0 : regexec_e((a)->u.r, ps, 0, 1, 0, psl)
>
> /*
> * Return TRUE if the command applies to the current line. Sets the inrange
>
--
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden
.