On 2012/11/17 12:54, Buck Golemon wrote:
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell wrote:
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
U+0081 (there are always at least four digits in this notati
-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of Masatoshi Kimura
Sent: Wednesday, November 21, 2012 12:28 PM
To: unicode@unicode.org
Subject: Re: cp1252 decoder implementation
(2012/11/22 1:58), Shawn Steele wrote:
> We aren’t going change names (since that’ll
> No-one would be more happy than me if we could just ditch all the legacy
> encodings and all switch to Unicode everywhere, but that will never happen.
> There is enough legacy content out there that will never be converted.
That's sort of exactly the point:
*NEW* content should be UTF-8 (or
Buck Golemon wrote:
The status of these 5 characters is already in the best fit mappings
document pointed to by the IANA registry entry for windows-1252,
which is strong as I’m willing to go for them.
I don't understand the relation between bestfit1252 and cp1252. Could
you clarify it for me?
)
>
> ** **
>
> SSDE,
>
> Microsoft
>
> ** **
>
> *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On
> Behalf Of *Murray Sargent
> *Sent:* Tuesday, November 20, 2012 8:55 PM
> *To:* verd...@wanadoo.fr; Doug Ewell
>
Netscape 1.0 RC 1 is available here:
http://www.oldversion.com/Netscape.html
Den 2012-11-21 19:30:50 skrev Doug Ewell :
My problem is with the double standard. In some people's minds, if IE
does it, it's called "moronic" or "brain-dead."
If the software with the biggest market share does it, then everyone else
will have to follow it, no matter what you call it, unfo
Philippe Verdy wrote:
> May be you've forgotten FrontPage, a product acquired by Microsoft and
> then developped by Microsoft and widely promoted as part of Office,
> that insisted in declaring webpages as "ISO 8859-1" even if they
> contained characters that are only in "windows-1252". Even if w
(2012/11/22 1:58), Shawn Steele wrote:
> We aren’t going change names (since that’ll break
> anyone already using them), we probably won’t recognize new names (since
> anyone trying to use a new name wouldn’t work on millions of existing
> computers, so no one would add it).
Hey, why Microsoft chan
May be you've forgotten FrontPage, a product acquired by Microsoft and then
developped by Microsoft and widely promoted as part of Office, that
insisted in declaring webpages as "ISO 8859-1" even if they contained
characters that are only in "windows-1252". Even if we edited the page
externally to
"Peter Krefting" wrote:
>> Somewhat off-topic, I find it amusing that tolerance of "poorly
>> encoded" input is considered justification for changing the
>> underlying standards, when Internet Explorer has been flamed for
>> years and years for tolerating bad input.
>
> It's called adapting to re
Philippe Verdy wrote:
> But may be we could ask to Microsoft to map officially C1 controls on
> the remaining holes of windows-1252, to help improve the
> interoperability in HTML5 with a predictable and stable behavior
> across HTML5 applications. In that case the W3C needs not doing
> anything
-bou...@unicode.org] On Behalf
Of Murray Sargent
Sent: Tuesday, November 20, 2012 8:55 PM
To: verd...@wanadoo.fr; Doug Ewell
Cc: Unicode Mailing List; Buck Golemon
Subject: RE: cp1252 decoder implementation
Phillipe commented: “(even if later Microsoft decides to map some other
characters in its own
On 2012/11/21 16:23, Peter Krefting wrote:
Doug Ewell :
Somewhat off-topic, I find it amusing that tolerance of "poorly
encoded" input is considered justification for changing the underlying
standards,
The encoding work at W3C, at least as far as I see it, is not an attempt
to redefine e.g.
Hi
On 21 November 2012 16:42, Philippe Verdy wrote:
> But may be we could ask to Microsoft to map officially C1 controls on the
> remaining holes of windows-1252, to help improve the interoperability in
> HTML5 with a predictable and stable behavior across HTML5 applications. In
> that case the
Doug Ewell :
Somewhat off-topic, I find it amusing that tolerance of "poorly encoded"
input is considered justification for changing the underlying standards,
when Internet Explorer has been flamed for years and years for
tolerating bad input.
It's called adapting to reality, unfortunatel
But may be we could ask to Microsoft to map officially C1 controls on the
remaining holes of windows-1252, to help improve the interoperability in
HTML5 with a predictable and stable behavior across HTML5 applications. In
that case the W3C needs not doing anything else and there's no need to
update
Phillipe commented: "(even if later Microsoft decides to map some other
characters in its own "windows-1252" charset, like it did several times and
notably when the Euro symbol was mapped)".
Personal opinion, but I'd be very surprised if Microsoft ever changed the 1252
charset. The euro was add
To solve the situation, it would be smarter if the W3C was not referencing
the Microsoft standard itself but a standardized version of it, explaining
explicitly how to handle the unassigned code positions. The W3C coud
describe the expected mapping of these positions explicitly in its own
standard,
Buck Golemon wrote:
> What effort has been spent? This is not an either/or type of
> proposition. If we can agree that it's an improvement (albeit small),
> let's update the mapping.
> Is it much harder than I believe it is?
ISO/IEC 8859-1 is, uh, an ISO/IEC standard. CP1252 is a Microsoft
corpo
> What effort has been spent? This is not an either/or type of proposition.
> If we can agree that it's an improvement (albeit small), let's update the
> mapping.
> Is it much harder than I believe it is?
What if some application's treating it as undefined? And now the code page
gets updated to
On Sat, Nov 17, 2012 at 10:52 AM, Shawn Steele
wrote:
> IMO this isn’t worth the effort being spent on it. MOST encodings have
> all sorts of interesting quirks, variations, OEM or App specific behavior,
> etc. These are a few code points that haven’t really caused much
> confusion, and other c
I find these to be true statements, but I don't see how they support or
refute that which came before.
On Sun, Nov 18, 2012 at 3:58 PM, Philippe Verdy wrote:
> The same chapter makes a normative reference to ISO/IEC 2022 for C0
> controls, it does not say that this concerns ISO/IEC 8859 (which
The same chapter makes a normative reference to ISO/IEC 2022 for C0
controls, it does not say that this concerns ISO/IEC 8859 (which does not
reference itself ISO/IEC 2022 as being normative, but only informational
just to day that it is compatible with it, as well as with ISO 6429, and a
wide rang
into ? at best.
-Shawn
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of Buck Golemon
Sent: Saturday, November 17, 2012 8:35 AM
To: verd...@wanadoo.fr
Cc: Doug Ewell; unicode
Subject: Re: cp1252 decoder implementation
> So don't say that there are one-for-one
> So don't say that there are one-for-one equivalences.
I was just quoting this section of the standard:
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
> There is a simple, one-to-one mapping between 7-bit (and 8-bit) control
codes and the Unicode control codes: every 7-bit (or 8-bit) cont
If you are thinking about "byte values" you are working at the encoding
scheme level (in fact another lower level which defines a protocol
presentation layer, e.g. "transport syntaxes" in MIME). Unicode codepoints
are conceptually not an encoding scheme, just a coded character set
(independant of t
Buck Golemon wrote:
This isn't quite as black-and-white as the question about Latin-1. If
you are targeting HTML5, you are probably safe in treating an
incoming 0x81 (for example) as either U+0081 or U+FFFD, or throwing
some kind of error.
Why do you make this conditional on targeting html5?
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell wrote:
> Buck Golemon wrote:
>
> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
>> to map it to the equally-non-semantic U+81 ?
>>
>> This would allow systems that follow the html5 standard and use cp1252
>> in place of latin1 t
1)
I did this and was criticized for inventing my own "frankensteined"
encoding, although I believe it's conceptually consistent with the idea
that cp1252 is to be used as a superset of latin1.
It's true that what I wrote is not consistent with the unicode.orgdefinition:
tp://ftp.unicode.org/Public
Golemon; unicode
Subject: Re: cp1252 decoder implementation
Buck Golemon wrote:
> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
> to map it to the equally-non-semantic U+81 ?
>
> This would allow systems that follow the html5 standard and use cp1252
> in pla
Zitat von Buck Golemon :
cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.
Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these unknown bytes to
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
This would allow systems that follow the html5 standard and use cp1252
in place of latin1 to continue to be binary-faithful and reversible.
This isn't quite
So I find that the unicode.org cp1252 file leaves those bytes undefined as
well, so the issue stems from there.
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to
map it to the equally-non-semantic U+81
cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.
Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these unknown bytes to
round-trip through the codec
35 matches
Mail list logo