Another good choice for c would be U+001A, preserving the original meaning of the old ASCII SUB character. My understanding is that, back in the days of teletypes, SUB originally caused the following character to be displayed in red ink instead of black ink, until smarter printers came along, after which time SUB caused the following character to be selected from an alternative character set. Of course, all that changed when the 8th bit started to be used. Now the C0 control codepoints (apart from TAB, CR, LF and FF) are nothing but an ancient historical legacy which (in my opinion) could be re-used for something else. (That won't happen, of course, because of stability guarantees).
But it's the "knowing" part that the problem. Can you really "know" that such any given byte sequence won't appear in plain text? That's the only reason I thought of pushing the probability of incorrect identification down astronomically low.
Jill
-----Original Message----- From: Peter Kirk [mailto:[EMAIL PROTECTED] Sent: 15 December 2004 12:54 To: Arcane Jill Cc: Unicode Subject: Re: Roundtripping Solved
But would it not work just as well to for Lars' purposes to use, instead of your string of random characters, just ONE reserved code point followed by U+0xx?