Re: UTF-8 locale & POSIX text model

2017-11-30 Thread keld
On Sun, Nov 26, 2017 at 04:16:36PM +, Stephane Chazelas wrote: > 2017-11-26 14:07:50 +0100, k...@keldix.com: > [...] > > > For instance, as currently specified, POSIX says that the > > > output of the "locale" utility be suitable for reinput to the > > > shell and requiring double-quote quoting

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Hans Åberg
> On 27 Nov 2017, at 22:51, Chet Ramey wrote: > > On 11/27/17 1:12 PM, Hans Åberg wrote: > On MacOS 10.13, one can set locale environment variables. The Terminal default login shell reads .profile; xterm reads .bashrc. There are other ways to set them system-wide, changing wit

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Chet Ramey
On 11/27/17 1:12 PM, Hans Åberg wrote: >>> On MacOS 10.13, one can set locale environment variables. The Terminal >>> default login shell reads .profile; xterm reads .bashrc. There are other >>> ways to set them system-wide, changing with the OS version. >> >> Terminal has been able to pass the

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Hans Åberg
> On 27 Nov 2017, at 22:04, Chet Ramey wrote: > > On 11/27/17 12:51 PM, Hans Åberg wrote: > >> On MacOS 10.13, one can set locale environment variables. The Terminal >> default login shell reads .profile; xterm reads .bashrc. There are other >> ways to set them system-wide, changing with the

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Chet Ramey
On 11/27/17 12:51 PM, Hans Åberg wrote: > On MacOS 10.13, one can set locale environment variables. The Terminal > default login shell reads .profile; xterm reads .bashrc. There are other ways > to set them system-wide, changing with the OS version. Terminal has been able to pass the locale env

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Hans Åberg
> On 27 Nov 2017, at 19:35, Chet Ramey wrote: > > On 11/27/17 1:19 AM, Hans Åberg wrote: > The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; thus with no additional qualifications like in LC_CTYPE=en_US.UTF-8. It would be interesting to know if it is POSIX conformi

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Hans Åberg
> On 27 Nov 2017, at 10:43, Stephane Chazelas > wrote: > > 2017-11-26 22:40:45 +0100, Hans Åberg: > [...] >> The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; >> thus with no additional qualifications like in >> LC_CTYPE=en_US.UTF-8. It would be interesting to know if it is >> POSI

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Chet Ramey
On 11/27/17 1:19 AM, Hans Åberg wrote: >>> The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; thus with no >>> additional qualifications like in LC_CTYPE=en_US.UTF-8. It would be >>> interesting to know if it is POSIX conforming, as it causes confusion with >>> some software. >> >>

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Joerg Schilling
Joseph Myers wrote: > On Sat, 25 Nov 2017, k...@keldix.com wrote: > > > systems, and also implementations that can conform using UTF-16 and > > different 8-bit codesets. For instance 'A' is coded x0041 (two bytes) in > > UTF-16 > > and x41 (only one byte) in cp850, and UTF-8. > > ISO C includes

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Joseph Myers
On Sat, 25 Nov 2017, k...@keldix.com wrote: > systems, and also implementations that can conform using UTF-16 and > different 8-bit codesets. For instance 'A' is coded x0041 (two bytes) in > UTF-16 > and x41 (only one byte) in cp850, and UTF-8. ISO C includes (C99 TC2 onwards) the requirement "

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Stephane Chazelas
2017-11-26 22:40:45 +0100, Hans Åberg: [...] > The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; > thus with no additional qualifications like in > LC_CTYPE=en_US.UTF-8. It would be interesting to know if it is > POSIX conforming, as it causes confusion with some software. Yes, I've s

Re: UTF-8 locale & POSIX text model

2017-11-27 Thread Hans Åberg
> On 27 Nov 2017, at 03:16, Chet Ramey wrote: > > On 11/26/17 1:40 PM, Hans Åberg wrote: > >> The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; thus with no >> additional qualifications like in LC_CTYPE=en_US.UTF-8. It would be >> interesting to know if it is POSIX conforming, as

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread Chet Ramey
On 11/26/17 1:40 PM, Hans Åberg wrote: > The deprecated HFS uses UTF-16, but MacOS has LC_CTYPE=UTF-8; thus with no > additional qualifications like in LC_CTYPE=en_US.UTF-8. It would be > interesting to know if it is POSIX conforming, as it causes confusion with > some software. I don't see t

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread Hans Åberg
> On 26 Nov 2017, at 13:43, k...@keldix.com wrote: > > Well, the pathname processing should be a function of the filesystem. Eg if > you have a windows > filesystem, or an apple filesystem mounted on a linux operating system, then > the file names > of the foreign system should be interpreted a

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread Hans Åberg
> On 26 Nov 2017, at 13:43, k...@keldix.com wrote: > > I don't have windos nor apple systems, but they run utf-16 natively, and > recent > Windows 10 system have a full linux (ubuntu) subsystem. I could also see > problems > with utf-16 and posix, but at least apple should have solved that pro

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread Stephane Chazelas
2017-11-26 14:07:50 +0100, k...@keldix.com: [...] > > For instance, as currently specified, POSIX says that the > > output of the "locale" utility be suitable for reinput to the > > shell and requiring double-quote quoting in some cases. > > > > Using double-quote quoting is problematic because of

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread keld
On Sun, Nov 26, 2017 at 11:04:39AM +, Stephane Chazelas wrote: > 2017-11-25 20:53:20 +0100, k...@keldix.com: > [...] > > > It just says those characters are the one constituting the > > > portable character set. It doesn't specify the encoding other > > > than it mandates the encoding of those

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread k...@keldix.com
On Sun, Nov 26, 2017 at 02:09:21AM +, Danny Niu wrote: > > > On 26 Nov 2017, at 3:53 AM, k...@keldix.com wrote: > > On Wed, Nov 22, 2017 at 05:43:51PM +, Stephane Chazelas wrote: > 2017-11-22 16:27:15 +0100, Martijn Dekker: > Op 22-11-17 om 16:02 schreef Geoff Cla

Re: UTF-8 locale & POSIX text model

2017-11-26 Thread Stephane Chazelas
2017-11-25 20:53:20 +0100, k...@keldix.com: [...] > > It just says those characters are the one constituting the > > portable character set. It doesn't specify the encoding other > > than it mandates the encoding of those characters to be > > invariant in the charsets in the system's supported loca

Re: UTF-8 locale & POSIX text model

2017-11-25 Thread Danny Niu
On 26 Nov 2017, at 3:53 AM, k...@keldix.com wrote: On Wed, Nov 22, 2017 at 05:43:51PM +, Stephane Chazelas wrote: 2017-11-22 16:27:15 +0100, Martijn Dekker: Op 22-11-17 om 16:02 schreef Geoff Clare: Danny Niu mailto:danny...@hotmail.com>> wrote, on 22 Nov 2017: Q1:

Re: UTF-8 locale & POSIX text model

2017-11-25 Thread keld
On Wed, Nov 22, 2017 at 05:43:51PM +, Stephane Chazelas wrote: > 2017-11-22 16:27:15 +0100, Martijn Dekker: > > Op 22-11-17 om 16:02 schreef Geoff Clare: > > > Danny Niu wrote, on 22 Nov 2017: > > >> > > >> Q1: What is the rationale for not making POSIX an application of ASCII? > > > > > > S

Re: Re: UTF-8 locale & POSIX text model

2017-11-24 Thread Shware Systems
It's my understanding that column was added to provide the stock names for those code points in creating charmap files in the format supported by localedef. This is an informative reference for convenience as the standard  also lists those names elsewhere.   As to Q2, the general direction I see

Re: UTF-8 locale & POSIX text model

2017-11-22 Thread Stephane Chazelas
2017-11-22 16:27:15 +0100, Martijn Dekker: > Op 22-11-17 om 16:02 schreef Geoff Clare: > > Danny Niu wrote, on 22 Nov 2017: > >> > >> Q1: What is the rationale for not making POSIX an application of ASCII? > > > > So that systems which use other encodings (specifically EBCDIC) can > > be POSIX-c

Re: UTF-8 locale & POSIX text model

2017-11-22 Thread Geoff Clare
Martijn Dekker wrote, on 22 Nov 2017: > > Op 22-11-17 om 16:02 schreef Geoff Clare: > > Danny Niu wrote, on 22 Nov 2017: > >> > >> Q1: What is the rationale for not making POSIX an application of ASCII? > > > > So that systems which use other encodings (specifically EBCDIC) can > > be POSIX-con

Re: UTF-8 locale & POSIX text model

2017-11-22 Thread Martijn Dekker
Op 22-11-17 om 16:02 schreef Geoff Clare: > Danny Niu wrote, on 22 Nov 2017: >> >> Q1: What is the rationale for not making POSIX an application of ASCII? > > So that systems which use other encodings (specifically EBCDIC) can > be POSIX-conforming. IBM z/OS is certified UNIX 95 and uses EBCDIC

Re: UTF-8 locale & POSIX text model

2017-11-22 Thread Geoff Clare
Danny Niu wrote, on 22 Nov 2017: > > Q1: What is the rationale for not making POSIX an application of ASCII? So that systems which use other encodings (specifically EBCDIC) can be POSIX-conforming. IBM z/OS is certified UNIX 95 and uses EBCDIC. -- Geoff Clare The Open Group, Apex Plaza, Forb

Re: UTF-8 locale & POSIX text model

2017-11-22 Thread Martijn Dekker
Op 22-11-17 om 13:58 schreef Danny Niu: > Q1: What is the rationale for not making POSIX an application of ASCII? Actually, it mostly is. POSIX mandates that all supported locales include the "portable character set", which is ASCII minus some control characters. http://pubs.opengroup.org/online

UTF-8 locale & POSIX text model

2017-11-22 Thread Danny Niu
Hi all, I've a few questions about textual data in POSIX systems. It was said that the goal of the POSIX standards is to ensure source code level portability of application programs. However, some lunatix system may swap the upper and lower case columns of the ASCII table, to make vertical bar