Re: ksh [emacs.c] -- simplify isu8cont()

ropers Sat, 25 Jul 2020 20:17:07 -0700

On 25/07/2020, Martijn van Duren <openbsd+t...@list.imperialat.at> wrote:
> This function is used throughout the OpenBSD tree and I think it's
> fine as it is. This way it's clearer to me that it's about byte
> 7 and 8

You mean bits 7 and 8 when counting from 1 from the right?

> and not have to do the math in my head to check if we
> might have botched it.
>
> Also compilers should be smart enough to optimize this out at
> compile-time anyway.
>
> martijn@

So the (0x80 | 0x40) was supposed to be *for* legibility?

IMHO it hurts legibility, but admittedly, it depends on who you are
and what you have memorised:
Finding (0x80 | 0x40) easier than 0xC0 assumes that people have nibble
translation between binary and hexadecimal memorised all the way up to
8 but *NOT* up to C.  And that they find dealing with two values and
an extra binary OR still less hassle than remembering that C is 1100.
Of course if that's your decision, that's your decision, but speaking
as a novice C programmer, I think 'C' is easier. ;D [0]

But while you're reading this, would you at least consider committing
the explanatory comment?  Not everybody is already familiar with how
UTF-8 works, and I think this comment above the function is still some
extra hand-holding beginners might find useful:
/* is octet a UTF-8 continuation byte? */

Also, while you're here, can anyone tell me what the zot in x_zots() /
x_zotc() actually stands for?  I thought it was "zero out the
{string|character}" when I looked at x_zots(), but then I doubted
myself once I saw that that's not strictly what x_zotc() actually
does.  Does anyone know?

Ian

footnote:
[0] Which latter thought, incidentally, is also why someone invented BCHS. ;)

> On Sat, 2020-07-25 at 17:40 +0100, ropers wrote:
>> Index: emacs.c
>> ===================================================================
>> RCS file: /cvs/src/bin/ksh/emacs.c,v
>> retrieving revision 1.87
>> diff -u -r1.87 emacs.c
>> --- emacs.c  8 May 2020 14:30:42 -0000       1.87
>> +++ emacs.c  25 Jul 2020 16:31:22 -0000
>> @@ -269,10 +269,11 @@
>>      { 0, 0, 0 },
>>  };
>>
>> +/* is octet a UTF-8 continuation byte? */
>>  int
>>  isu8cont(unsigned char c)
>>  {
>> -    return (c & (0x80 | 0x40)) == 0x80;
>> +    return (c & 0xC0) == 0x80;
>>  }
>>
>>  int
>>
>
>

Re: ksh [emacs.c] -- simplify isu8cont()

Reply via email to