On 9 November 2010 22:13, Charles Davis <cda...@mymail.mines.edu> wrote: > On 11/9/10 1:58 PM, James Mckenzie wrote: >> Charles Davis <cda...@mymail.mines.edu> wrote: >>> >>> On 11/9/10 12:13 PM, James Mckenzie wrote: >>>> No, it is not a bug in GNU sed. The authors.c file needs to have the >>>> erroneous characters for the language used by >>>> MacOSX changed to be acceptable? >>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale >>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes >>> over the AUTHORS file. >> >> Don't shoot the messenger. > Sorry. > > The problem with your first idea--removing the bad characters directly > from the authors.c file--is that we'd need to use a utility like sed or > awk to implement it automatically--which puts us right back where we > started. (We could use diff/patch, but is it worth the effort to > maintain a patch for this? And would AJ let us put the patch file in > Wine? And if not, where would we put it?) >> Maybe we can force the use of sed if it exists in the /usr/bin directory >> then to get around the 'brokenness' of GNU sed on the Mac? > Maybe. But that seems like a hack. A better way might be to detect if > we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed. > That's less of a hack, but still a hack. >> If not, it is a real bear to set the language on a Mac per previous >> discussions on the Users list. > That was about setting LANG. Wine always obeys LC_*, and so does sed. > > It's not the language that's the problem. It's the encoding. The AUTHORS > file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told > it not to (i.e. we told it to use MacRoman because that's the default > encoding for the C locale). If we tell it to use UTF-8 (by setting > LC_ALL to, for example, 'en_US.UTF-8'), it will process the file correctly. > > Unfortunately, I just remembered that the name of the UTF-8 encoding is > different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us > from setting LC_ALL differently. We might end up having to hack around > this the way either you or I described.
You could use autoconf to detect: 1/ broken handling of UTF-8 characters by sed; 2/ name of LC_ALL flag that handles UTF-8 NOTE: You will need to enumerate available locales as the user may not have en_US present with UTF-8 encoding (e.g. a Spanish-only or Chinese-only system). Something like: cat > get_locale.sh < EOF locale -a | while read locale ; do if [[ LC_ALL=$locale sed < authors.c > /dev/null ]] ; then echo $locale exit fi done EOF This should print a locale that can process the UTF-8 file. It needs cleaning up a bit, but that is the basis of it. HTH, - Reece