David Starner wrote:
Chances are nearly 100% that overlong UTF-8 was a spoofing attempt, or the
result of something other than a UTF-8 encoder.
With the exception of overlong sequences for null (C0 80?), which Java
generates in an attempt to avoid true nulls.
I am aware of this one. This encod
William,
Note the smiley. Ken's suggestion was a tongue in the hollow-skulls
cheek.
Yes, a 2 character sequence is less likely to occur, but is still a
possibility, so your proposal doesn't actually fix the problem. The
usual workaround is for a convention that uses characters with special
semant
Kenneth Whistler wrote the following.
>I think Marku's suggestion is correct. If you want to do
>something like this internally to a process, use a noncharacter
>code point for it. If you want to have visible display of this
>kind of error handling for conversion, then simply declare a
>convention
Hello.
Markus Scherer wrote:
> Chances are nearly 100% that overlong UTF-8 was a
> spoofing attempt, or the result of something other than a
> UTF-8 encoder.
Correct. This is exactly my topic.
Wouldn't it be nice to have a standardized way to indicate
that an attack to the message has occured wi
Dominikus Scherkl replied to Markus:
> > > My other suggestion (and the main reason to call the proposed
> > > charakter "source failure indicator symbol" (SFIS)) was intended
> > > especaly for mall-formed utf-8 input that has overlong encodings.
> > This is a special, custom form of error handli
On Wed, Oct 30, 2002 at 03:13:53PM -0800, Markus Scherer wrote:
> Chances are nearly 100% that overlong UTF-8 was a spoofing attempt, or the
> result of something other than a UTF-8 encoder.
With the exception of overlong sequences for null (C0 80?), which Java
generates in an attempt to avoid tr
Dominikus Scherkl wrote:
Converting from and to utf-8 is an all-day topic, very important
for all applications handling with unicode. So it is a special
Converting text to/from UTF-8 is indeed common and important.
Converting text that claims to be UTF-8 - but isn't - is different: It may be a
Markus Scherer wrote:
> Dominikus Scherkl wrote:
> > My other suggestion (and the main reason to call the proposed
> > charakter "source failure indicator symbol" (SFIS)) was intended
> > especaly for mall-formed utf-8 input that has overlong encodings.
> This is a special, custom form of error han
Dominikus Scherkl wrote:
My other suggestion (and the main reason to call the proposed
charakter "source failure indicator symbol" (SFIS)) was intended
especaly for mall-formed utf-8 input that has overlong encodings.
In this special case a converter exactly knows which char is
intended, but need
John Cowan wrote:
> This sounds basically like an extension of U+303E IDEOGRAPHIC
> VARIATION INDICATOR (whose semantic is: "The following character
> is not what I want, but it's the best approximation I can get")
> to non-ideographs.
>
> I have no problem with this idea.
So you mean: use U+303
Dominikus Scherkl wrote:
> I would like to have a "source failure indicator symbol" (SFIS)
> charakter in the unicode, which a charset-convertion unit may
> insert into a text (Suggeested position: U+FFF8).
>
> [...]
>
> Of course a converter can still use U+FFFD if it has no
> idea which charact
We had thought of something similar, but which would provide more
information in interfaces.
Reserve a space of 256 code points, with names:
UNCONVERTIBLE BYTE-00
UNCONVERTIBLE BYTE-01
...
UNCONVERTIBLE BYTE-FF
During a conversion process, if some bytes (say from corrupt UTF-8) cannot
be correct
12 matches
Mail list logo