On 25/07/2020, Martijn van Duren <openbsd+t...@list.imperialat.at> wrote: > This function is used throughout the OpenBSD tree and I think it's > fine as it is. This way it's clearer to me that it's about byte > 7 and 8
You mean bits 7 and 8 when counting from 1 from the right? > and not have to do the math in my head to check if we > might have botched it. > > Also compilers should be smart enough to optimize this out at > compile-time anyway. > > martijn@ So the (0x80 | 0x40) was supposed to be *for* legibility? IMHO it hurts legibility, but admittedly, it depends on who you are and what you have memorised: Finding (0x80 | 0x40) easier than 0xC0 assumes that people have nibble translation between binary and hexadecimal memorised all the way up to 8 but *NOT* up to C. And that they find dealing with two values and an extra binary OR still less hassle than remembering that C is 1100. Of course if that's your decision, that's your decision, but speaking as a novice C programmer, I think 'C' is easier. ;D [0] But while you're reading this, would you at least consider committing the explanatory comment? Not everybody is already familiar with how UTF-8 works, and I think this comment above the function is still some extra hand-holding beginners might find useful: /* is octet a UTF-8 continuation byte? */ Also, while you're here, can anyone tell me what the zot in x_zots() / x_zotc() actually stands for? I thought it was "zero out the {string|character}" when I looked at x_zots(), but then I doubted myself once I saw that that's not strictly what x_zotc() actually does. Does anyone know? Ian footnote: [0] Which latter thought, incidentally, is also why someone invented BCHS. ;) > On Sat, 2020-07-25 at 17:40 +0100, ropers wrote: >> Index: emacs.c >> =================================================================== >> RCS file: /cvs/src/bin/ksh/emacs.c,v >> retrieving revision 1.87 >> diff -u -r1.87 emacs.c >> --- emacs.c 8 May 2020 14:30:42 -0000 1.87 >> +++ emacs.c 25 Jul 2020 16:31:22 -0000 >> @@ -269,10 +269,11 @@ >> { 0, 0, 0 }, >> }; >> >> +/* is octet a UTF-8 continuation byte? */ >> int >> isu8cont(unsigned char c) >> { >> - return (c & (0x80 | 0x40)) == 0x80; >> + return (c & 0xC0) == 0x80; >> } >> >> int >> > >