Re: UTF-8 string filtering

Stefan Sperling Sat, 05 Sep 2015 08:12:36 -0700

On Sat, Sep 05, 2015 at 04:38:30PM +0300, pizdel...@gmail.com wrote:
> On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote:
> > I can't see where you're checking for overlong UTF-8 sequences, for example.
> 
> It is somewhere in there
> 
> +                       } else if ((e & 0xe0) == 0xc0) { /* 11 bit code point 
> */
> +                               state = 1;                                    
> +                               c = (e & 0x1f) << 6;                        
> [snip]
> +                       /*                                                 
> +                        * Check that the header byte has some non-zero data
> +                        * after masking off the length marker. If not it is
> +                        * an invalid encoding.                 
> +                        */                                              
> +                       if (c == 0) {                                         
>  
> + bad_encoding:                       
> 
> That being said, I find that state variable danse in utf8_decode() very ugly 
> and confusing -- but then I'm not a developer so I better shut up.


Yes, utf8_decode() does some checks and reports errors via the
had_error pointer.

But its caller utf8_stringprep() ignores any such errors, doesn't it?
My question is whether that's a problem. I believe it is. Do you agree?
If not, why not?

Re: UTF-8 string filtering

Reply via email to