Re: sed(1): missing NUL in pattern space

kshe Sat, 01 Jul 2017 06:21:08 -0700

On Tue, 27 Jun 2017 09:29:10 +0000, Otto Moerbeek wrote:
 >
 > at last a followup, for the original problem.
 >
 > This diff incorporates your later comment. It does not cause the newly
 > added regress test to fail, though.
 >
 > So that poses the question if this is what you meant.
 >
 >     -Otto
 >
 > Index: process.c
 > ===================================================================
 > RCS file: /cvs/src/usr.bin/sed/process.c,v
 > retrieving revision 1.32
 > diff -u -p -r1.32 process.c
 > --- process.c    22 Feb 2017 14:09:09 -0000    1.32
 > +++ process.c    27 Jun 2017 09:16:33 -0000
 > @@ -120,8 +120,10 @@ redirect:
 >                  cp = cp->u.c;
 >                  goto redirect;
 >              case 'c':
 > +                if (pd)
 > +                    break;
 >                  pd = 1;
 > -                psl = 0;
 > +                ps[psl = 0] = '\0';
 >                  if (cp->a2 == NULL || lastaddr || lastline())
 >                      (void)fprintf(outfile, "%s", cp->t);
 >                  break;
 > @@ -138,6 +140,7 @@ redirect:
 >                  } else {
 >                      psl -= (p + 1) - ps;
 >                      memmove(ps, p + 1, psl);
 > +                    ps[psl] = '\0';
 >                      goto top;
 >                  }
 >              case 'g':
 >


Indeed, there was a wording error in my second paragraph: I meant to
align the behaviour of `l' (not `c') with that of `p', such that using
`l' after `c' would not print anything (not even `$'), just like `p' in
this case, which does not print anything either (not even a newline).

Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.32
diff -up -r1.32 process.c
--- process.c   22 Feb 2017 14:09:09 -0000      1.32
+++ process.c   1 Jul 2017 10:24:21 -0000
@@ -121,7 +121,7 @@ redirect:
                                goto redirect;
                        case 'c':
                                pd = 1;
-                               psl = 0;
+                               ps[psl = 0] = '\0';
                                if (cp->a2 == NULL || lastaddr || lastline())
                                        (void)fprintf(outfile, "%s", cp->t);
                                break;
@@ -138,6 +138,7 @@ redirect:
                                } else {
                                        psl -= (p + 1) - ps;
                                        memmove(ps, p + 1, psl);
+                                       ps[psl] = '\0';
                                        goto top;
                                }
                        case 'g':
@@ -158,6 +159,8 @@ redirect:
                                (void)fprintf(outfile, "%s", cp->t);
                                break;
                        case 'l':
+                               if (pd)
+                                       break;
                                lputs(ps);
                                break;
                        case 'n':

That being said, there is more to this issue than what such a simple
patch could fix.  For example, in a way your diff also makes sense: if
it is believed that one ought not to use `l' nor `p' to print an
inexistant pattern space, then surely deleting it again with `c' should
be disallowed too.  However, as partly demonstrated by my last message,
there is absolutely no coherence anywhere about this, so in the end one
might wonder which command is correctly behaved and which isn't.

For instance, it is a matter of course that the sequence `N;s/.*\n//;p'
should be just the same as `n' when normal output has not been disabled;
but, in OpenBSD's sed, because that `pd' flag is implemented in such an
erratic fashion, this isn't always the case:

        $ jot 6 | sed 'c\
        > text
        > :loop
        > N;s/.*\n//;p
        > bloop'
        text
        2
        4
        6

        $ jot 6 | sed 'c\
        > text
        > :loop
        > n
        > bloop'
        text
        2
        3
        4
        5
        6

For the same reason, nothing is printed by the following script, even if
all lines are read:

        $ jot 6 | sed 'c\
        > text
        > :loop
        > $q;N
        > bloop'
        text

(As a side note, amusingly, replacing the `q' above with `s/$//p' would
actually produce the expected output, when in contrast using something
like `s/^//p' would not...  But this is an unrelated bug.)

Likewise, one would believe the command `y/a/b/' to be functionally
equivalent to `s/a/b/g'; but, again, many examples of the opposite
behaviour can be constructed; for instance:

        $ echo | sed 'c\
        > text
        > s/^/a/;s/^//
        > y/a/b/
        > s/^//p;d'
        text
        a

        $ echo | sed 'c\
        > text
        > s/^/a/;s/^//
        > s/a/b/g
        > s/^//p;d'
        text
        b

Because, of course, thanks to that `pd' magic, `y' is only effective
after `c' when an odd number of substitutions have affected the pattern
space.  Although it is true that the above two bugs are primarily due to
the incorrect handling of the `deleted' flag by the `s' command, they
nonetheless help to expose further the inconsistency of the code, making
its overall fragility even more obvious than it would have otherwise
been; in this case in particular, they allow one to notice that:

1.  Even though `n' resets the `deleted' flag, `N' does not.
2.  Even though `s' ignores the `deleted' flag, `y' does not.

And so, similarily, even if `p', `P', and `w' check for it, `l' ignores
`pd' completely; and, indeed, so does `c', in contrast with other
commands such as `D'.

I hope that these examples make it clear that the problem is much worse
than what the above diff actually solves.  Of course, I could have
written a third detailed report appropriately addressing all of this,
but I figured no reasonable amount of patching endeavour could ever
truly and satisfyingly improve the overall condition of such inherently
flawed code.  Therefore, I no longer believe any of my patches (nor any
other patches, really) to be relevant to OpenBSD's current sed.

This might sound a little extreme, but, in fact, as my previous messages
illustrate, that `pd' madness is far from being the only incongruity
lurking within sed's source.  I should also add that, since then, I
actually found more bugs, some of them even affecting the parsing
algorithm...  To take only one example out of many, the implementation
of the `y' command is incorrect:

        $ sed 'y/[/]/'
        sed: 1: "y/[/]/": unterminated transform target string

(The parser thinks that an opening bracket in an argument to the `y'
command is meant to be the start of a character class, as if it were
part of a regular expression.)

Considering that sed is my favourite Unix utility (dc comes close; vi
and sh don't count as they are more than mere utilities), I would be
very sad if all my beautiful sed scripts were forever to be handled by
such a mediocre implementation, since I don't really plan to ever use
anything else than OpenBSD.  Also, in light of all these observations,
instead of devising endless diffs for a piece of software that clearly
doesn't deserve any, I am willing to rewrite it from scratch.  This
might take a few weeks as I do not have enough time to work on it
regularly, but there is apparently no hurry: after all, I seem to be the
only one out there to actually need such features as empty match
replacement to work reliably, no matter if the last command was `D' or
if there happens to be an embed NUL or newline in the pattern space.

Besides, and perhaps even more importantly, a clean rewrite could yield
many other improvements (like a more restrictive pledge(2) depending on
the outcome of the scripts' compilation), without the hassle of having
to untangle such old and messy code.  As a consequence, especially since
no one at OpenBSD appears to have time to deal with all the deficiencies
of the existing source, it might be simpler to wait until I am done with
my own, as my code shall be much more readable than the current one, and
therefore quite easy to review.

Kind regards,

kshe

Re: sed(1): missing NUL in pattern space

Reply via email to