CVSROOT:        /cvs
Module name:    src
Changes by:     schwa...@cvs.openbsd.org        2014/12/18 21:57:11

Modified files:
        usr.bin/mandoc : preconv.c 
        regress/usr.bin/mandoc/char/unicode: Makefile 
Added files:
        regress/usr.bin/mandoc/char/unicode: input.in input.out_ascii 
                                             input.out_lint 
                                             input.out_utf8 

Log message:
Rewrite the low-level UTF-8 parser from scratch.
It accepted invalid byte sequences like 0xc080-c1bf, 0xe08080-e09fbf,
0xeda080-edbfbf, and 0xf0808080-f08fbfbf, produced valid roff Unicode
escape sequences from them, and the algorithm contained strong
defenses against any attempt to fix it.

This cures an assertion failure in the terminal formatter caused
by sneaking in ASCII 0x08 (backspace) by "encoding" it as an (invalid)
multibyte UTF-8 sequence, found by jsg@ with afl.

As a bonus, the new algorithm also reduces the code in the function
by about 20%.

Reply via email to