On 11/9/10 8:02 PM, Charles Davis wrote:
On 11/9/10 7:58 PM, James McKenzie wrote:
On 11/9/10 3:29 PM, Reece Dunn wrote:
On 9 November 2010 22:13, Charles Davis<cda...@mymail.mines.edu>   wrote:
On 11/9/10 1:58 PM, James Mckenzie wrote:
Charles Davis<cda...@mymail.mines.edu>   wrote:
On 11/9/10 12:13 PM, James Mckenzie wrote:
No, it is not a bug in GNU sed.  The authors.c file needs to have
the erroneous characters for the language used by
MacOSX changed to be acceptable?
That ain't gonna fly. I think we should explicitly use a UTF-8 locale
(like en_US.UTF-8 or some such) instead of the C locale when sed goes
over the AUTHORS file.
Don't shoot the messenger.
Sorry.

The problem with your first idea--removing the bad characters directly
from the authors.c file--is that we'd need to use a utility like sed or
awk to implement it automatically--which puts us right back where we
started. (We could use diff/patch, but is it worth the effort to
maintain a patch for this? And would AJ let us put the patch file in
Wine? And if not, where would we put it?)
   Maybe we can force the use of sed if it exists in the /usr/bin
directory then to get around the 'brokenness' of GNU sed on the Mac?
Maybe. But that seems like a hack. A better way might be to detect if
we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed.
That's less of a hack, but still a hack.
   If not, it is a real bear to set the language on a Mac per
previous discussions on the Users list.
That was about setting LANG. Wine always obeys LC_*, and so does sed.

It's not the language that's the problem. It's the encoding. The AUTHORS
file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told
it not to (i.e. we told it to use MacRoman because that's the default
encoding for the C locale). If we tell it to use UTF-8 (by setting
LC_ALL to, for example, 'en_US.UTF-8'), it will process the file
correctly.

Unfortunately, I just remembered that the name of the UTF-8 encoding is
different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us
from setting LC_ALL differently. We might end up having to hack around
this the way either you or I described.
You could use autoconf to detect:
    1/  broken handling of UTF-8 characters by sed;
    2/  name of LC_ALL flag that handles UTF-8

NOTE: You will need to enumerate available locales as the user may not
have en_US present with UTF-8 encoding (e.g. a Spanish-only or
Chinese-only system).

Something like:

cat>   get_locale.sh<   EOF
locale -a | while read locale ; do
     if [[ LC_ALL=$locale sed<   authors.c>   /dev/null ]] ; then
        echo $locale
        exit
     fi
done
EOF

This should print a locale that can process the UTF-8 file. It needs
cleaning up a bit, but that is the basis of it.

Thanks Reece.

Charles:  You want to do this?
I'm on it.

If you have a patch ready, though, go for it.

No, I'm stuck with a problem in richedit. Besides you have more Mac specific knowledge than I do, and I'm happy to say that. Although, if you need a test 'victim' I'm here for you.

James McKenzie



Reply via email to