BOCU-1 might solve this problem, but multiplying and dividing by 243 doesn't sound faster than UTF-8 bit-shifting. (I'm still amazed by the claim in UTN #6 that converting Hindi text between UTF-16 and BOCU-1 took only 45% as long as converting it between UTF-16 and UTF-8.)
"claim"? That hurts...
I did measure these things, and the numbers in the table are all from my measurements. I also included the type of machine I used, etc. (http://www.unicode.org/notes/tn6/#Performance)
The reason why BOCU-1 (and SCSU) is often faster than UTF-8 is that BOCU-1 goes into single-byte mode for small scripts like Hindi. Single-byte mode only performs a subtraction, no div/mod or even bit-shifting, and writes/reads only one byte per character. It is also optimized in ICU with a tight inner loop.
UTF-8 on the other hand encodes Hindi with 3 bytes per character and has to perform the bit-shifting and write to/read from more memory locations.
It's the same for Greek/Russian/Arabic etc., although to a lesser degree because it's single bytes with BOCU-1 vs. only 2 bytes per character with UTF-8.
The fact that BOCU-1 not only achieves good compression (and binary order and MIME text/ compatibility) but also reasonable conversion performance encouraged Mark and me to publish it.
UTF-8 is useful because it's simple, and supported just about everywhere - but it's otherwise hardly optimal for anything.
If you want high-speed, compact encoding, use SCSU. If you want good speed, compact encoding, and binary order and/or MIME compatibility, use BOCU-1. Make sure that both sides of the wire know what's going across.
markus