Re: New Charakter Proposal

Markus Scherer Wed, 30 Oct 2002 10:17:37 -0800

Dominikus Scherkl wrote:

My other suggestion (and the main reason to call the proposed
charakter "source failure indicator symbol" (SFIS)) was intended
especaly for mall-formed utf-8 input that has overlong encodings.


In this special case a converter exactly knows which char is
intended, but needs to put out an error to avoid ambiguities.
In this case by now it MUST replace the overlong char by U+FFFD
(or even cancel the conversion!).
But I think SFIS + intended-char is a far better approach,
because it
1) warns the reader AND keeps the text readable
2) distinguish overlong encodings from illegal char sequenzes.

This is a special, custom form of error handling - why assign a character for it?

You could just use an existing character or non-character for this, e.g., U+303E or U+FFFF or U+FDEF or similar.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Re: New Charakter Proposal

Reply via email to