subject:"cp1252 decoder implementation"

Re: cp1252 decoder implementation

2012-11-27 Thread Martin J. Dürst

On 2012/11/17 12:54, Buck Golemon wrote: On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell wrote: Buck Golemon wrote: Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to map it to the equally-non-semantic U+81 ? U+0081 (there are always at least four digits in this notati

RE: cp1252 decoder implementation

2012-11-24 Thread Shawn Steele

- From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Masatoshi Kimura Sent: Wednesday, November 21, 2012 12:28 PM To: unicode@unicode.org Subject: Re: cp1252 decoder implementation (2012/11/22 1:58), Shawn Steele wrote: > We aren’t going change names (since that’ll

RE: cp1252 decoder implementation

2012-11-24 Thread Shawn Steele

> No-one would be more happy than me if we could just ditch all the legacy > encodings and all switch to Unicode everywhere, but that will never happen. > There is enough legacy content out there that will never be converted. That's sort of exactly the point: *NEW* content should be UTF-8 (or

Re: cp1252 decoder implementation

2012-11-23 Thread Doug Ewell

Buck Golemon wrote: The status of these 5 characters is already in the best fit mappings document pointed to by the IANA registry entry for windows-1252, which is strong as I’m willing to go for them. I don't understand the relation between bestfit1252 and cp1252. Could you clarify it for me?

Re: cp1252 decoder implementation

2012-11-23 Thread Buck Golemon

) > > ** ** > > SSDE, > > Microsoft > > ** ** > > *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On > Behalf Of *Murray Sargent > *Sent:* Tuesday, November 20, 2012 8:55 PM > *To:* verd...@wanadoo.fr; Doug Ewell >

Re: cp1252 decoder implementation

2012-11-22 Thread Andrew Miller

Netscape 1.0 RC 1 is available here: http://www.oldversion.com/Netscape.html

Re: cp1252 decoder implementation

2012-11-22 Thread Peter Krefting

Den 2012-11-21 19:30:50 skrev Doug Ewell : My problem is with the double standard. In some people's minds, if IE does it, it's called "moronic" or "brain-dead." If the software with the biggest market share does it, then everyone else will have to follow it, no matter what you call it, unfo

RE: cp1252 decoder implementation

2012-11-21 Thread Doug Ewell

Philippe Verdy wrote: > May be you've forgotten FrontPage, a product acquired by Microsoft and > then developped by Microsoft and widely promoted as part of Office, > that insisted in declaring webpages as "ISO 8859-1" even if they > contained characters that are only in "windows-1252". Even if w

Re: cp1252 decoder implementation

2012-11-21 Thread Masatoshi Kimura

(2012/11/22 1:58), Shawn Steele wrote: > We aren’t going change names (since that’ll break > anyone already using them), we probably won’t recognize new names (since > anyone trying to use a new name wouldn’t work on millions of existing > computers, so no one would add it). Hey, why Microsoft chan

Re: cp1252 decoder implementation

2012-11-21 Thread Philippe Verdy

May be you've forgotten FrontPage, a product acquired by Microsoft and then developped by Microsoft and widely promoted as part of Office, that insisted in declaring webpages as "ISO 8859-1" even if they contained characters that are only in "windows-1252". Even if we edited the page externally to

RE: cp1252 decoder implementation

2012-11-21 Thread Doug Ewell

"Peter Krefting" wrote: >> Somewhat off-topic, I find it amusing that tolerance of "poorly >> encoded" input is considered justification for changing the >> underlying standards, when Internet Explorer has been flamed for >> years and years for tolerating bad input. > > It's called adapting to re

RE: cp1252 decoder implementation

2012-11-21 Thread Doug Ewell

Philippe Verdy wrote: > But may be we could ask to Microsoft to map officially C1 controls on > the remaining holes of windows-1252, to help improve the > interoperability in HTML5 with a predictable and stable behavior > across HTML5 applications. In that case the W3C needs not doing > anything

RE: cp1252 decoder implementation

2012-11-21 Thread Shawn Steele

-bou...@unicode.org] On Behalf Of Murray Sargent Sent: Tuesday, November 20, 2012 8:55 PM To: verd...@wanadoo.fr; Doug Ewell Cc: Unicode Mailing List; Buck Golemon Subject: RE: cp1252 decoder implementation Phillipe commented: “(even if later Microsoft decides to map some other characters in its own

Re: cp1252 decoder implementation

2012-11-21 Thread Martin J. Dürst

On 2012/11/21 16:23, Peter Krefting wrote: Doug Ewell : Somewhat off-topic, I find it amusing that tolerance of "poorly encoded" input is considered justification for changing the underlying standards, The encoding work at W3C, at least as far as I see it, is not an attempt to redefine e.g.

Re: cp1252 decoder implementation

2012-11-20 Thread Andrew Cunningham

Hi On 21 November 2012 16:42, Philippe Verdy wrote: > But may be we could ask to Microsoft to map officially C1 controls on the > remaining holes of windows-1252, to help improve the interoperability in > HTML5 with a predictable and stable behavior across HTML5 applications. In > that case the

Re: cp1252 decoder implementation

2012-11-20 Thread Peter Krefting

Doug Ewell : Somewhat off-topic, I find it amusing that tolerance of "poorly encoded" input is considered justification for changing the underlying standards, when Internet Explorer has been flamed for years and years for tolerating bad input. It's called adapting to reality, unfortunatel

Re: cp1252 decoder implementation

2012-11-20 Thread Philippe Verdy

But may be we could ask to Microsoft to map officially C1 controls on the remaining holes of windows-1252, to help improve the interoperability in HTML5 with a predictable and stable behavior across HTML5 applications. In that case the W3C needs not doing anything else and there's no need to update

RE: cp1252 decoder implementation

2012-11-20 Thread Murray Sargent

Phillipe commented: "(even if later Microsoft decides to map some other characters in its own "windows-1252" charset, like it did several times and notably when the Euro symbol was mapped)". Personal opinion, but I'd be very surprised if Microsoft ever changed the 1252 charset. The euro was add

Re: cp1252 decoder implementation

2012-11-20 Thread Philippe Verdy

To solve the situation, it would be smarter if the W3C was not referencing the Microsoft standard itself but a standardized version of it, explaining explicitly how to handle the unassigned code positions. The W3C coud describe the expected mapping of these positions explicitly in its own standard,

Re: cp1252 decoder implementation

2012-11-20 Thread Doug Ewell

Buck Golemon wrote: > What effort has been spent? This is not an either/or type of > proposition. If we can agree that it's an improvement (albeit small), > let's update the mapping. > Is it much harder than I believe it is? ISO/IEC 8859-1 is, uh, an ISO/IEC standard. CP1252 is a Microsoft corpo

RE: cp1252 decoder implementation

2012-11-18 Thread Shawn Steele

> What effort has been spent? This is not an either/or type of proposition. > If we can agree that it's an improvement (albeit small), let's update the > mapping. > Is it much harder than I believe it is? What if some application's treating it as undefined? And now the code page gets updated to

Re: cp1252 decoder implementation

2012-11-18 Thread Buck Golemon

On Sat, Nov 17, 2012 at 10:52 AM, Shawn Steele wrote: > IMO this isn’t worth the effort being spent on it. MOST encodings have > all sorts of interesting quirks, variations, OEM or App specific behavior, > etc. These are a few code points that haven’t really caused much > confusion, and other c

Re: cp1252 decoder implementation

2012-11-18 Thread Buck Golemon

I find these to be true statements, but I don't see how they support or refute that which came before. On Sun, Nov 18, 2012 at 3:58 PM, Philippe Verdy wrote: > The same chapter makes a normative reference to ISO/IEC 2022 for C0 > controls, it does not say that this concerns ISO/IEC 8859 (which

Re: cp1252 decoder implementation

2012-11-18 Thread Philippe Verdy

The same chapter makes a normative reference to ISO/IEC 2022 for C0 controls, it does not say that this concerns ISO/IEC 8859 (which does not reference itself ISO/IEC 2022 as being normative, but only informational just to day that it is compatible with it, as well as with ISO 6429, and a wide rang

RE: cp1252 decoder implementation

2012-11-17 Thread Shawn Steele

into ? at best. -Shawn From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of Buck Golemon Sent: Saturday, November 17, 2012 8:35 AM To: verd...@wanadoo.fr Cc: Doug Ewell; unicode Subject: Re: cp1252 decoder implementation > So don't say that there are one-for-one

Re: cp1252 decoder implementation

2012-11-17 Thread Buck Golemon

> So don't say that there are one-for-one equivalences. I was just quoting this section of the standard: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf > There is a simple, one-to-one mapping between 7-bit (and 8-bit) control codes and the Unicode control codes: every 7-bit (or 8-bit) cont

Re: cp1252 decoder implementation

2012-11-16 Thread Philippe Verdy

If you are thinking about "byte values" you are working at the encoding scheme level (in fact another lower level which defines a protocol presentation layer, e.g. "transport syntaxes" in MIME). Unicode codepoints are conceptually not an encoding scheme, just a coded character set (independant of t

Re: cp1252 decoder implementation

2012-11-16 Thread Doug Ewell

Buck Golemon wrote: This isn't quite as black-and-white as the question about Latin-1. If you are targeting HTML5, you are probably safe in treating an incoming 0x81 (for example) as either U+0081 or U+FFFD, or throwing some kind of error. Why do you make this conditional on targeting html5?

Re: cp1252 decoder implementation

2012-11-16 Thread Buck Golemon

On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell wrote: > Buck Golemon wrote: > > Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and >> to map it to the equally-non-semantic U+81 ? >> >> This would allow systems that follow the html5 standard and use cp1252 >> in place of latin1 t

Re: cp1252 decoder implementation

2012-11-16 Thread Buck Golemon

1) I did this and was criticized for inventing my own "frankensteined" encoding, although I believe it's conceptually consistent with the idea that cp1252 is to be used as a superset of latin1. It's true that what I wrote is not consistent with the unicode.orgdefinition: tp://ftp.unicode.org/Public

RE: cp1252 decoder implementation

2012-11-16 Thread Shawn Steele

Golemon; unicode Subject: Re: cp1252 decoder implementation Buck Golemon wrote: > Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and > to map it to the equally-non-semantic U+81 ? > > This would allow systems that follow the html5 standard and use cp1252 > in pla

Re: cp1252 decoder implementation

2012-11-16 Thread martin

Zitat von Buck Golemon : cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not. This leaves five bytes with undefined semantics. Currently the python cp1252 decoder allows us to ignore/replace/error on these bytes, but there's no facility for allowing these unknown bytes to

Re: cp1252 decoder implementation

2012-11-16 Thread Doug Ewell

Buck Golemon wrote: Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to map it to the equally-non-semantic U+81 ? This would allow systems that follow the html5 standard and use cp1252 in place of latin1 to continue to be binary-faithful and reversible. This isn't quite

Re: cp1252 decoder implementation

2012-11-16 Thread Buck Golemon

So I find that the unicode.org cp1252 file leaves those bytes undefined as well, so the issue stems from there. ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to map it to the equally-non-semantic U+81

cp1252 decoder implementation

2012-11-16 Thread Buck Golemon

cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not. This leaves five bytes with undefined semantics. Currently the python cp1252 decoder allows us to ignore/replace/error on these bytes, but there's no facility for allowing these unknown bytes to round-trip through the codec

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

RE: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

RE: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

Re: cp1252 decoder implementation

cp1252 decoder implementation

35 matches

Site Navigation

Mail list logo

Footer information