Re: AUTHORS list and the C locale on Mac OS X

Reece Dunn Tue, 09 Nov 2010 14:30:00 -0800

On 9 November 2010 22:13, Charles Davis <cda...@mymail.mines.edu> wrote:
> On 11/9/10 1:58 PM, James Mckenzie wrote:
>> Charles Davis <cda...@mymail.mines.edu> wrote:
>>>
>>> On 11/9/10 12:13 PM, James Mckenzie wrote:
>>>> No, it is not a bug in GNU sed.  The authors.c file needs to have the 
>>>> erroneous characters for the language used by
>>>> MacOSX changed to be acceptable?
>>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale
>>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes
>>> over the AUTHORS file.
>>
>> Don't shoot the messenger.
> Sorry.
>
> The problem with your first idea--removing the bad characters directly
> from the authors.c file--is that we'd need to use a utility like sed or
> awk to implement it automatically--which puts us right back where we
> started. (We could use diff/patch, but is it worth the effort to
> maintain a patch for this? And would AJ let us put the patch file in
> Wine? And if not, where would we put it?)
>>  Maybe we can force the use of sed if it exists in the /usr/bin directory 
>> then to get around the 'brokenness' of GNU sed on the Mac?
> Maybe. But that seems like a hack. A better way might be to detect if
> we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed.
> That's less of a hack, but still a hack.
>>  If not, it is a real bear to set the language on a Mac per previous 
>> discussions on the Users list.
> That was about setting LANG. Wine always obeys LC_*, and so does sed.
>
> It's not the language that's the problem. It's the encoding. The AUTHORS
> file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told
> it not to (i.e. we told it to use MacRoman because that's the default
> encoding for the C locale). If we tell it to use UTF-8 (by setting
> LC_ALL to, for example, 'en_US.UTF-8'), it will process the file correctly.
>
> Unfortunately, I just remembered that the name of the UTF-8 encoding is
> different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us
> from setting LC_ALL differently. We might end up having to hack around
> this the way either you or I described.


You could use autoconf to detect:
  1/  broken handling of UTF-8 characters by sed;
  2/  name of LC_ALL flag that handles UTF-8

NOTE: You will need to enumerate available locales as the user may not
have en_US present with UTF-8 encoding (e.g. a Spanish-only or
Chinese-only system).

Something like:

cat > get_locale.sh < EOF
locale -a | while read locale ; do
   if [[ LC_ALL=$locale sed < authors.c > /dev/null ]] ; then
      echo $locale
      exit
   fi
done
EOF

This should print a locale that can process the UTF-8 file. It needs
cleaning up a bit, but that is the basis of it.

HTH,
- Reece

Re: AUTHORS list and the C locale on Mac OS X

Reply via email to