Peter Kirk <[EMAIL PROTECTED]> writes: > Jill, again your solution is ingenious. But would it not work just > as well to for Lars' purposes to use, instead of your string of > random characters, just ONE reserved code point followed by U+0xx? > Instead of asking the UTC to allocate a specific code point for this > (which it probably will not do), he can use either U+FFFE or U+FFFF, > which "are intended for process internal uses, but are not permitted > for interchange." Let's call the one non-character chosen INVALID.
Perhaps what is needed is a shift of viewpoint, not a big technical change. Don't call it a UTF. Call it escaping. Don't reserve 128 code points. Use an existing but rare code point to prefix a byte escaped among code points, and escape the escape if it's found in the original. Perhaps the character could be ESC (27) or SUB (26), followed by U+00nn. Well, a viewpoint shift doesn't solve all problems: it's still dangerous for interoperability. If the programmer doesn't do anything special when writing filenames to a file, then instead of an error which indicates that the goal doesn't have a natural solution he gets an escaped string which will not be understood by other applications wich don't use this convention. If the filename is passed to a part of the program which doesn't use this convention, then it will break too. If something cannot be done reliably, it's better to signal the problem immediately than to hide it and misbehave later. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/