Hi, There is an ongoing confusion in sed's source about the concept of EOL: the code contradicts itself as to whether it should be determined by a trailing NUL or by looking up the `len' field of the SPACE structure.
At least two commands (`l' and `s') assume that the pattern space is always NUL-terminated. However, at least two other commands (`c' and `D') do not comply with that assumption, modifying it by merely updating its length attribute. As a result, using `l' or `s' after `c' or `D' produces incorrect output. For instance, in the following excerpt, `l' should print `bb$': $ printf 'a\nbb\n' | sed '${l;q;};N;D' bbbb$ bb For the same underlying reason, it also disregards the deletion of the pattern space that `c' is supposed to entail: $ echo abc | sed 'c\ > text > l' text abc$ The substitute command can likewise get confused and add an extra character after a replacement: $ printf 'a\nbb\n' | sed '${s/$/x/;q;};N;D' bbxb Note how this does not happen when the substitution pattern matches more than the empty string: $ printf 'a\nbb\n' | sed '${s/.$/&x/;q;};N;D' bbx After an empty match, the character that `s' adds to the pattern space depends on how memmove(3) modified it beforehand. Here's a very simplified version of the original sequence of commands from which I discovered these bugs, where `i' appear instead of `b': $ printf 'a\nbb\n' | sed '${s/$/x/;q;};N;s/a/ZZi/;D' bbxi Similarily, when one believes the pattern space to be empty after a `c' command, something like `s/.*//' will magically revive one character from that emptiness: $ echo abc | sed 'c\ > text > s/.*//' text a As I came across these in an actual script that I was writing, I was quite amazed to find out that they went unnoticed for so long, without anyone ever trying to see what `l' does after `c' or `D'. Indeed, the `l' command is broken since revision 1.1 of process.c, making that bug older than I am. It has also been present in the other BSDs, but: o FreeBSD fixed it by accident in 2004 by adding support for multibyte encodings; o NetBSD fixed it by accident in 2014 by importing FreeBSD's sed (merging their changes afterwards, but that didn't reintroduce the bug, since they hadn't touched that part of the code). The more serious bug regarding empty matches in the substitute command was introduced in OpenBSD in 2011, when schwarze@ decided to rewrite its main loop, intending to "fix multiple issues regarding the replacement of zero-length strings", but apparently with the unfortunate side effect of introducing a new issue regarding the replacement of zero-length strings. Finally, this commit was adopted by FreeBSD in 2016 when they synchronised some of their code with OpenBSD's to ease importing other fixes from it. Thus, OpenBSD still has the old `l' bug as well as the more recent `s' one, FreeBSD only has the latter, and since 2014 NetBSD is not longer affected by any of these, although the fact that `c' and `D' omit to NUL-terminate the pattern space could be considered a bug in itself, since at least one piece of code present for 20 years in their tree was making the contrary assumption. So who knows where else it was made? Actually, with all due respect, probably not even the original developers: many of their choices suggest that the end of the pattern space was not always meant to be encoded as a trailing NUL, but they nevertheless designed the function `lputs' to only take a simple string parameter, by which no separate length information could ever be accessible. I therefore have the strong feeling that this is not the last bug of BSD's sed. The remaining ones may be hidden in dark corners, but I do expect them to show up at some point in the future, considering how fragile the existing code looks. The overall situation also prevents any significant improvement to be successfully conducted, as changing more than two lines will most likely lead to subtle regressions; see the last few commits at https://svnweb.freebsd.org/base/head/usr.bin/sed/process.c and http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/usr.bin/sed/process.c for recent examples of such failed attempts. Anyway, until I digress even more, here is a patch for that one, forcing `c' and `D' to put a NUL where `l' and `s' expect one to be. By the way, whether the fault is on one side or the other is actually unclear. It does seem redundant to enforce such a convention when one already has access to the length of the corresponding string. As a matter of fact, the overwhelming majority of the code does not appear to ever rely on a NUL being there at all; for instance, `D' and `P' use memchr(3), not strchr(3), which, besides, allows for the manipulation of NUL bytes originally present in the input. Hence a more complete patch could instead try to unify the whole source to follow one convention consistently; however, that would probably take as much time as a full rewrite, because that is essentially what would need to be done, so, for now, I kept it as simple as it could be, to avoid messing it up even more than it already is. --- a/src/usr.bin/sed/process.c Wed Feb 22 14:09:09 2017 +++ b/src/usr.bin/sed/process.c Wed Jun 7 09:56:20 2017 @@ -121,7 +121,7 @@ redirect: goto redirect; case 'c': pd = 1; - psl = 0; + ps[psl = 0] = '\0'; if (cp->a2 == NULL || lastaddr || lastline()) (void)fprintf(outfile, "%s", cp->t); break; @@ -138,6 +138,7 @@ redirect: } else { psl -= (p + 1) - ps; memmove(ps, p + 1, psl); + ps[psl] = '\0'; goto top; } case 'g': Also note how one could have thought of fixing `D' by making its memmove(3) go up to `psl + 1' instead of `psl', since there now should always be a NUL at the end; however, a separate assignment makes the intent clearer, and I read somewhere that explicit was better than implicit. That being said, the more I think about this, the more I am convinced that the problematic commands were in fact `l' and `s'. Since the code already uses fgetln(3) to read lines, it would have been nicer (and optimal) to never rely on NULs. As stated above, this is already the case almost everywhere; therefore, if any of the OpenBSD developers wanted to elaborate on that idea after committing such an unsatisfying patch, I would of course be very happy to help. Kind regards, kshe