Peter Kirk <[EMAIL PROTECTED]> writes:
> Jill, again your solution is ingenious. But would it not work just
> as well to for Lars' purposes to use, instead of your string of
> random characters, just ONE reserved code point followed by U+0xx?
> Instead of asking the UTC to allocate a specific code
On 15/12/2004 14:36, Arcane Jill wrote:
Yes, but only if you can have some reasonable assurance that the byte
sequence emitted by UTF(c,x) (where c is the single reserved codepoint
you suggest, and x is U+00xx, the value to be escaped expressed as a
character) will not occur in plain text. This
> Nope. No data corruption. You just get the odd bytes back. And achieve
I see more of what you are trying to do; let me try to be more clear.
Suppose that the conversion is defined in the following way, between Unicode
strings (D29a-d, page 74) and UTFs using your proposed new characters, for
now
Title: RE: Roundtripping Solved
Arcane Jill wrote:
> solution, again without breaking the Unicode model. If I have
> It is for reasons of requirement (4) that Lars proposes the
> introduction of
> 128 BMP codepoints. His intention is that they be marked as
> "reserved - do
> not use"
Arcane Jill wrote:
> DEFINITION - "f" is a function which maps an arbitrary octet stream to
> a sequence of Unicode characters, such that (1) any substring which
> happens to be valid UTF-8 is mapped to the sequence of Unicode
> characters which would have been produced by UTF-8, and (2) all
> re
Title: RE: Roundtripping in Unicode
> From: Peter Kirk [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, December 15, 2004 3:52 AM
> But surely octets 0x80 to 0x9f are (at least mostly) invalid
> in ISO 8859?
They are in fact valid. However, because they are control characters, the
Yes, but only if you can have some reasonable assurance that the byte
sequence emitted by UTF(c,x) (where c is the single reserved codepoint you
suggest, and x is U+00xx, the value to be escaped expressed as a character)
will not occur in plain text. This is theoretically checkable - the total
Title: RE: Roundtripping in Unicode
Marcin 'Qrczak' Kowalczyk wrote:
> If one application switches from standard UTF-8 to your modification,
> and another application continues to use standard UTF-8, then the
> ability to pass arbitrary Unicode strings between them by serializing
> them to UTF
Marcin 'Qrczak' Kowalczyk wrote:
>> OBSERVATION - Requirement (4) is not met absolutely, however,
>> the probability of the UTF-8 encoding of this sequence occuring
>> "accidently" at an arbitrary offset in an arbitrary octet stream
>> is approximately one in 2^384;
>
> Assuming that the distribu
On 15/12/2004 11:11, Arcane Jill wrote:
I followed (and understood) Lar's explanation as to why the NOT-
solution wouldn't work for him. Shame really - but here's another bash
at a solution, again without breaking the Unicode model. If I have
understood this correctly, these are Lars' requir
Lars Kristan <[EMAIL PROTECTED]> writes:
> OK, strcpy does not need to interpret UTF-8. But strchr probably should.
No. Its argument is a byte, even though it's passed as type int.
By "byte" here I mean "C char value, which is an octet in virtually
all modern C implementations; the C standard doe
Title: RE: Roundtripping in Unicode
Kenneth Whistler wrote:
> Lars said:
>
> > According to UTC, you need to keep processing
> > the UNIX filenames as BINARY data. And, also according to
> UTC, any UTF-8
> > function is allowed to reject invalid sequences. Basically,
> you are not
> > suppo
Title: RE: Roundtripping in Unicode
D. Starner wrote:
> The only solution is (a) to use ASCII or (b) to make the
> switch over as quick
> and clean as possible. Anyone who wants to create new files
> in UTF-8 and leave
> their old files in the old encoding is asking for trouble.
> There's
Title: RE: Roundtripping in Unicode
Ops, correction:
In response to Marcin 'Qrczak' Kowalczyk
>> Question: should a new programming language which uses Unicode for
>> string representation allow non-characters in strings? Argument for
>> allowing them: otherwise they are completely use
Title: RE: Roundtripping in Unicode
Arcane Jill wrote:
> The obvious solution is for all Unix machines everywhere to
> be using the
> same locale - and it had better be UTF-8. But an instantaneous global
> switch-over is never going to happen, so we see this gradual
> switch-over ...
> an
Title: RE: Roundtripping in Unicode
Philippe Verdy wrote:
> I have not
> found a solution to this problem, and I don't know if such
> solution even
> exists; if such solution exists, it should be quite complex...).
I think it should be possible to mathematically prove that it doesn't exi
"Arcane Jill" <[EMAIL PROTECTED]> writes:
> Unix makes is possible for /you/ to change /your/ locale - but by
> your reasoning, this is an error, unless all other users do so
> simultaneously.
Not necessarily: you can change the locale as long as it uses the same
default encoding.
By "error" I m
Title: RE: Roundtripping in Unicode
Marcin 'Qrczak' Kowalczyk replied:
> "Arcane Jill" <[EMAIL PROTECTED]> writes:
>
> > If so, Marcin, what exactly is the error, and whose fault is it?
>
> It's an error to use locales with different encodings on the same
> system.
U, and whose fault i
Lars Kristan <[EMAIL PROTECTED]> writes:
> Now, it is true that data from two applications using this technique can
> become intermixed. But this is not something we should fear. On the
> contrary, this is why I do what to standardize the approach. Because in most
> cases what will happen is exact
"Arcane Jill" <[EMAIL PROTECTED]> writes:
> OBSERVATION - Requirement (4) is not met absolutely, however,
> the probability of the UTF-8 encoding of this sequence occuring
> "accidently" at an arbitrary offset in an arbitrary octet stream
> is approximately one in 2^384;
Assuming that the distrib
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: 14 December 2004 22:47
To: Marcin 'Qrczak' Kowalczyk
Cc: [EMAIL PROTECTED]
Subject: Re: Roundtripping in Unicode
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
"Arcane Jill" <[EMAIL PROTECTED]> writes:
If so
On 15/12/2004 00:22, Mike Ayers wrote:
> From: Peter Kirk [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 14, 2004 3:37 PM
> Thanks for the clarification. Perhaps the bifurcation could
> be better expressed as into "strings of characters as defined
> by the locale" and "strings of non-null octe
Title: RE: Roundtripping in Unicode
Marcin 'Qrczak' Kowalczyk wrote:
> But it's not possible in the direction NOT-UTF-16 -> NOT-UTF-8 ->
> NOT-UTF-16, unless you define valid sequences of NOT-UTF-16 in an
> awkward way which would happen to exclude those subsequences of
> non-characters which
"Arcane Jill" writes:
> The obvious solution is for all Unix machines everywhere to be using the same
> locale - and it
> had better be UTF-8. But an instantaneous global switch-over is never going
> to happen, so we see
> this gradual switch-over ... and it is during this transition phase tha
Title: RE: UTF-8 vs. Non-UTF-8 Locales and File Names (WAS: Re: Roundtripping in Unicode)
Edward H. Trager wrote:
> UTF-8's home directory). So both users could probably guess
> the filename
> they were looking at.
Which, BTW, is true for most of Europe but is not true for some other combina
I followed (and understood) Lar's explanation as to why the NOT-
solution wouldn't work for him. Shame really - but here's another bash at a
solution, again without breaking the Unicode model. If I have understood
this correctly, these are Lars' requirements:
1) There exists a function, f()
26 matches
Mail list logo