Re: 16-bit wchar_t on Windows and Cygwin

2011-02-04 Thread Warren Young
On 2/2/2011 9:35 AM, Corinna Vinschen wrote: If only the one's who decided that wchar_t in Cygwin should have the same size as WCHAR_T in the underlying Windows would have thought twice about the implications... Cygwin 1.9? Or maybe 2.0, if it breaks ABIs? -- Problem reports:

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-03 Thread Corinna Vinschen
On Feb 3 01:12, Bruno Haible wrote: Hi Eric, I was asking: should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but unlike the POSIX definition of wchar_t being always 1 character per unit, the new type is explicitly documented as being multi-unit on some platforms

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-03 Thread Bruno Haible
Corinna Vinschen wrote: isn't wwchar_t equivalent to wint_t on all platforms? On UCS-4 platforms sizeof(wint_t) == sizeof(wchar_t) == 4 because there's no reason to make it bigger. On UCS-2 and UTF-16 platforms sizeof(wint_t) == 4 because it must be able to hold EOF as well. So, why not

Re: bug#7948: 16-bit wchar_t on Windows and Cygwin

2011-02-03 Thread Ulf Zibis
Hi, I think there is a kind of similar bug in discussion on GNU: bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372) -Ulf Am 02.02.2011 18:51, schrieb Paul Eggert: On 02/02/11 03:29, Bruno Haible wrote: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Bruno Haible
Hello Eric, ... POSIX requires that 1 wchar_t corresponds to 1 character ... What consequences does this have? 1) All code that uses the functions from wctype.h (wide character classification and mapping) or wcwidth() malfunctions on strings that contains Unicode

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Corinna Vinschen
On Feb 2 12:29, Bruno Haible wrote: Hello Eric, ... POSIX requires that 1 wchar_t corresponds to 1 character ... What consequences does this have? 1) All code that uses the functions from wctype.h (wide character classification and mapping) or wcwidth() malfunctions on

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Corinna Vinschen
On Feb 2 13:14, Corinna Vinschen wrote: On Feb 2 12:29, Bruno Haible wrote: Hello Eric, ... POSIX requires that 1 wchar_t corresponds to 1 character ... What consequences does this have? 1) All code that uses the functions from wctype.h (wide character

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Bruno Haible
Hello Corinna, And, please note the wording in SUSv4, for instance in http://calimero.vinschen.de/susv4/functions/iswalpha.html Likewise in POSIX:2008, at the URL http://www.opengroup.org/onlinepubs/9699919799/functions/iswalpha.html The wc argument is a wint_t, the value of which the

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Corinna Vinschen
Hi Bruno, On Feb 2 17:02, Bruno Haible wrote: Hello Corinna, And, please note the wording in SUSv4, for instance in http://calimero.vinschen.de/susv4/functions/iswalpha.html Likewise in POSIX:2008, at the URL http://www.opengroup.org/onlinepubs/9699919799/functions/iswalpha.html

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Corinna Vinschen
On Feb 2 17:28, Corinna Vinschen wrote: On Feb 2 17:02, Bruno Haible wrote: But if you say that the application should convert UTF-16 surrogates to UTF-32 before calling iswalpha: That's certainly a requirement for Cygwin 1.7.x application that want to support the entire Unicode

Re: bug#7948: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Paul Eggert
On 02/02/11 03:29, Bruno Haible wrote: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t on Windows platforms and to 'wchar_t' otherwise. As a minor point, would it be OK to call this type 'xchar_t' instead? 'x' is the successor to 'w', after all, and it can be thought

Re: bug#7948: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Bruno Haible
Hi Paul, - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t on Windows platforms and to 'wchar_t' otherwise. As a minor point, would it be OK to call this type 'xchar_t' instead? 'x' is the successor to 'w', after all, and it can be thought of as an abbreviation

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Andy Koppe
On 2 February 2011 16:35, Corinna Vinschen wrote: On Feb  2 17:28, Corinna Vinschen wrote: On Feb  2 17:02, Bruno Haible wrote: But if you say that the application should convert UTF-16 surrogates to UTF-32 before calling iswalpha: That's certainly a requirement for Cygwin 1.7.x

Re: bug#7948: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Andy Koppe
On 2 February 2011 18:57, Bruno Haible wrote: Hi Paul,   - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t     on Windows platforms and to 'wchar_t' otherwise. As a minor point, would it be OK to call this type 'xchar_t' instead?  'x' is the successor to 'w', after all,

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Eric Blake
[dropping coreutils at this point] On 02/02/2011 04:29 AM, Bruno Haible wrote: Good point. I agree then that overriding wchar_t should better not be done. Here's a new proposal: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t on Windows platforms and to 'wchar_t'

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Corinna Vinschen
On Feb 2 14:24, Eric Blake wrote: [dropping coreutils at this point] On 02/02/2011 04:29 AM, Bruno Haible wrote: Good point. I agree then that overriding wchar_t should better not be done. Here's a new proposal: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Bruno Haible
Hello Eric, Here's a new proposal: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t on Windows platforms and to 'wchar_t' otherwise. - Define functions 'mbrtowwc', 'iswwalpha', 'wwcwidth', and similar. Their definition will be a trivial redirection to

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Eric Blake
On 02/02/2011 04:03 PM, Bruno Haible wrote: Are you thinking of making a sane wrapping around either 4-byte wchar_t or which maps to 2-byte wchar_t but sanely handles UTF-16 (which makes it a thin wrapper on both Linux and Cygwin, but needing more work on mingw), or are you thinking that it is

Re: 16-bit wchar_t on Windows and Cygwin

2011-02-02 Thread Bruno Haible
Hi Eric, I was asking: should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but unlike the POSIX definition of wchar_t being always 1 character per unit, the new type is explicitly documented as being multi-unit on some platforms but with sane semantics or should it

Re: 16-bit wchar_t on Windows and Cygwin

2011-01-31 Thread Eric Blake
[adding cygwin and coreutils for a wc issue] On 01/30/2011 07:04 PM, Bruno Haible wrote: Hi, It is known for a long time that on native Windows, the wchar_t[] encoding on strings is UTF-16. [1] Now, Corinna Vinschen has confirmed that it is the same for Cygwin = 1.7. [2] POSIX requires

Re: 16-bit wchar_t on Windows and Cygwin

2011-01-31 Thread Corinna Vinschen
On Jan 31 09:58, Eric Blake wrote: 2) Code that uses mbrtowc() or wcrtomb() is also likely to malfunction. On Cygwin = 1.7 mbrtowc() and wcrtomb() is implemented in an intelligent but somewhat surprising way: wcrtomb() may return 0, that is, produce no output bytes