Costello, Roger L. wrote: > Suppose an application splits a UTF-8 multi-octet sequence. The > application then sends the split sequence to a client. The client must > restore the original sequence. > > Question: is it possible to split a UTF-8 multi-octet sequence in such > a way that the client cannot unambiguously restore the original > sequence?
1. (Bug) The folding process inserts CRLF plus white space characters, and the unfolding process doesn't properly delete all of them. 2. (Non-conformant behavior) Some process, after folding and before unfolding, attempts to interpret the partial UTF-8 sequences and converts them into replacement characters or worse. In a minimally decent implementation, splitting and reassembling a UTF-8 sequence should always yield the correct result; there should be no ambiguity. A good implementation, of course, would know the character encoding of the data, and would not split multi-byte sequences in that encoding to begin with. -- Doug Ewell | Thornton, CO, US | ewellic.org