On Thu, Jul 17, 2008 at 5:20 AM, Allison Randal [EMAIL PROTECTED] wrote:
The thing is, there's a tendency for data for a particular program or
application to all be from the same character set (if, for example, you're
parsing a series of files, munging the data in some way, and writing out a
On Wed, Jul 16, 2008 at 1:13 AM, Moritz Lenz
[EMAIL PROTECTED] wrote:
NotFound wrote:
* Unicode isn't necessarily universal, or might stop to be so in future.
If a character is not representable in Unicode, and you chose to use
Unicode for everything, you're screwed
There are provision for
Moritz Lenz wrote:
NotFound wrote:
To open another can of worms, I think that we can live without
character set specification. We can stablish that the character set is
always unicode, and to deal only with encodings.
We had that discussion already, and the answer was no for several reasons:
Hi,
I just saw that and such (too late) at #parrotsketch:
21:52 NotFound So unicode:\xab and utf8:unicode:\xab is also the same
result?
In my opinion and (AFAIK still in the implementation) is it that the encoding
bit of PIR is how the possibly escaped bytes are specifying the codepoint in
On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote:
21:51 pmichaud so unicode:« and unicode:\xab would produce
exactly
the same result.
21:51 pmichaud even down to being the same .pbc output.
21:51 allison pmichaud: exactly
The former is a valid char in an
unicode:\ab is illegal
No way. Unicode \ab should represent U+00AB. I don't care what
the byte-level representation is. In UTF-8, that's 0xc2 0xab; in
UTF-16BE it's 0x00 00ab; in UTF-32LE it's 0xab 0x00 0x00 0x00.
I think that there is still some confusion between the encoding of source
Am Dienstag, 15. Juli 2008 23:35 schrieb Patrick R. Michaud:
On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote:
21:51 pmichaud so unicode:« and unicode:\xab would produce
exactly the same result.
21:51 pmichaud even down to being the same .pbc output.
21:51 allison
On Tue, Jul 15, 2008 at 11:45 PM, Mark J. Reed [EMAIL PROTECTED] wrote:
IMESHO, the encoding of the source code should have no bearing on the
interpretation of string literal escape sequences within that source
code. \ab should mean U+00AB no matter whether the surrounding
source code is
Uhm, by the fact that they didn't type \ab65 ?
On 7/15/08, Leopold Toetsch [EMAIL PROTECTED] wrote:
Am Dienstag, 15. Juli 2008 23:35 schrieb Patrick R. Michaud:
On Tue, Jul 15, 2008 at 11:17:23PM +0200, Leopold Toetsch wrote:
21:51 pmichaud so unicode:« and unicode:\xab would produce
To open another can of worms, I think that we can live without
character set specification. We can stablish that the character set is
always unicode, and to deal only with encodings. Ascii is an encoding
that maps directly to codepoints and only allows 0-127 values.
iso-8859-1 is the same with
NotFound wrote:
To open another can of worms, I think that we can live without
character set specification. We can stablish that the character set is
always unicode, and to deal only with encodings.
We had that discussion already, and the answer was no for several reasons:
* Strings might
* Unicode isn't necessarily universal, or might stop to be so in future.
If a character is not representable in Unicode, and you chose to use
Unicode for everything, you're screwed
There are provision for private usage codepoints.
* related to the previous point, some other character
NotFound wrote:
* Unicode isn't necessarily universal, or might stop to be so in future.
If a character is not representable in Unicode, and you chose to use
Unicode for everything, you're screwed
There are provision for private usage codepoints.
If we use them in parrot, we can't use them
13 matches
Mail list logo