On Sat, Sep 05, 2015 at 04:38:30PM +0300, pizdel...@gmail.com wrote: > On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote: > > I can't see where you're checking for overlong UTF-8 sequences, for example. > > It is somewhere in there > > + } else if ((e & 0xe0) == 0xc0) { /* 11 bit code point > */ > + state = 1; > + c = (e & 0x1f) << 6; > [snip] > + /* > + * Check that the header byte has some non-zero data > + * after masking off the length marker. If not it is > + * an invalid encoding. > + */ > + if (c == 0) { > > + bad_encoding: > > That being said, I find that state variable danse in utf8_decode() very ugly > and confusing -- but then I'm not a developer so I better shut up.
Yes, utf8_decode() does some checks and reports errors via the had_error pointer. But its caller utf8_stringprep() ignores any such errors, doesn't it? My question is whether that's a problem. I believe it is. Do you agree? If not, why not?