https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes Even the Control codes within unicode aren't FF.
U+009C 156 String Terminator ST literal bytes \xC2\x9c are string terminator ... Was thinking that like APC and ST were higher than that... more in the range of 0xF8-0xFF On Thu, Jan 25, 2018 at 7:57 PM, J Decker <d3c...@gmail.com> wrote: > NUL is a valid utf8 character > but FF is never valid. (would be like a 36 bit length specification) > and practically anthing more than F8 is invalid utf8 character. > Other than BOM > https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 > EF BB BF 239 187 191 > > // EF - 80 | 3b - 80 | 3f > ( 0xfeff ) > > > Many Windows <https://en.wikipedia.org/wiki/Microsoft_Windows> programs > (including Windows Notepad > <https://en.wikipedia.org/wiki/Notepad_(Windows)>) add the bytes 0xEF, > 0xBB, 0xBF at the start of any document saved as UTF-8. Th > > (Not that BOM is even required, because, it's already ordered bytes) > ---------- > But anYway FF could be used as a string terminator instead of 00. It is > never legal in any utf-8 sequence. > (F8,F9,FA,FB,FC,FD,FE,FF) > F8 would be a 5 byte encoding, but that is more code points than unicode > has allocated. It could be potentially useful to permit a little extra > space in sequences , so I would avoid F8(F9,FA,FB) and stick to FC-FF for > possible control characters. > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users