Scott wrote: >It depends on what you call a character. If you consider a "character" the >same way most people do (one typographical unit), then you have to deal >with varying numbers of code points per character, even in a "fixed width" >encoding like UTF-32. There is no hard limit on how many combining marks >can be appended to a base code point.
>See >http://stackoverflow.com/questions/10414864/whats-up-with-these-unicode-combining-characters-and-how-can-we-filter-them >for a stupid / extreme example. -- UTF-* using sometimes a multiple 8 bits for a character. In the past as few as five ( e.g. https://en.wikipedia.org/wiki/Baudot_code ). Around the seventies six old Bull (e,g, GE 400-series' GBCD) resp. eight IBM (EBCDIC). ASCII started with seven plus a bit for parity checking. As always: YMMV. One may use this knowledge for encrypting I guess. Kind regards/Vriendelijke groeten. Klaas `Z4us` Van B., CEO/CIO LI#437429414