Dominikus Scherkl wrote:
My other suggestion (and the main reason to call the proposed
charakter "source failure indicator symbol" (SFIS)) was intended
especaly for mall-formed utf-8 input that has overlong encodings.

In this special case a converter exactly knows which char is
intended, but needs to put out an error to avoid ambiguities.
In this case by now it MUST replace the overlong char by U+FFFD
(or even cancel the conversion!).
But I think SFIS + intended-char is a far better approach,
because it
1) warns the reader AND keeps the text readable
2) distinguish overlong encodings from illegal char sequenzes.
This is a special, custom form of error handling - why assign a character for it?

You could just use an existing character or non-character for this, e.g., U+303E or U+FFFF or U+FDEF or similar.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.


Reply via email to