Re: [systemd-devel] [PATCH v2 1/2] utf8: intruduce utf8_escape_non_printable

Lennart Poettering Wed, 03 Dec 2014 17:38:42 -0800

On Wed, 19.11.14 12:35, David Herrmann (dh.herrm...@gmail.com) wrote:

> > +                        } else {
> > +                                if ((*str < ' ') || (*str >= 127)) {
> > +                                        *(s++) = '\\';
> > +                                        *(s++) = 'x';
> > +                                        *(s++) = hexchar((int) *str >> 4);
> > +                                        *(s++) = hexchar((int) *str);
> > +                                } else
> > +                                        *(s++) = *str;
> > +
> > +                                str += 1;
> 
> This part is wrong. You cannot rely on ``*str'' to be the correct
> Unicode value for the character. utf8_is_printable() returns false
> also for multi-byte UTF8 characters. By taking it unmodified, it will
> include the UTF8 management bits, which we really don't want here.
> 
> If you really want this, I'd prefer if you decode each UTF8 character,
> and if it is non-printable you print "\uABCD" or "\UABCDWXYZ" (like
> C++ does) as a 6-byte or 10-byte sequence. Other characters are just
> printed normally.


I have now committed the proposed patch but then changed the code to
iterate through all bytes of the unichar and escape that
individually. This form of escaping should be safe and be compatible
with C-style escaping (which \u isn't really...). Hope this makes sense.

Lennart

-- 
Lennart Poettering, Red Hat
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] [PATCH v2 1/2] utf8: intruduce utf8_escape_non_printable

Reply via email to