On 11/9/10 7:58 PM, James McKenzie wrote: > On 11/9/10 3:29 PM, Reece Dunn wrote: >> On 9 November 2010 22:13, Charles Davis<cda...@mymail.mines.edu> wrote: >>> On 11/9/10 1:58 PM, James Mckenzie wrote: >>>> Charles Davis<cda...@mymail.mines.edu> wrote: >>>>> On 11/9/10 12:13 PM, James Mckenzie wrote: >>>>>> No, it is not a bug in GNU sed. The authors.c file needs to have >>>>>> the erroneous characters for the language used by >>>>>> MacOSX changed to be acceptable? >>>>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale >>>>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes >>>>> over the AUTHORS file. >>>> Don't shoot the messenger. >>> Sorry. >>> >>> The problem with your first idea--removing the bad characters directly >>> from the authors.c file--is that we'd need to use a utility like sed or >>> awk to implement it automatically--which puts us right back where we >>> started. (We could use diff/patch, but is it worth the effort to >>> maintain a patch for this? And would AJ let us put the patch file in >>> Wine? And if not, where would we put it?) >>>> Maybe we can force the use of sed if it exists in the /usr/bin >>>> directory then to get around the 'brokenness' of GNU sed on the Mac? >>> Maybe. But that seems like a hack. A better way might be to detect if >>> we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed. >>> That's less of a hack, but still a hack. >>>> If not, it is a real bear to set the language on a Mac per >>>> previous discussions on the Users list. >>> That was about setting LANG. Wine always obeys LC_*, and so does sed. >>> >>> It's not the language that's the problem. It's the encoding. The AUTHORS >>> file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told >>> it not to (i.e. we told it to use MacRoman because that's the default >>> encoding for the C locale). If we tell it to use UTF-8 (by setting >>> LC_ALL to, for example, 'en_US.UTF-8'), it will process the file >>> correctly. >>> >>> Unfortunately, I just remembered that the name of the UTF-8 encoding is >>> different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us >>> from setting LC_ALL differently. We might end up having to hack around >>> this the way either you or I described. >> You could use autoconf to detect: >> 1/ broken handling of UTF-8 characters by sed; >> 2/ name of LC_ALL flag that handles UTF-8 >> >> NOTE: You will need to enumerate available locales as the user may not >> have en_US present with UTF-8 encoding (e.g. a Spanish-only or >> Chinese-only system). >> >> Something like: >> >> cat> get_locale.sh< EOF >> locale -a | while read locale ; do >> if [[ LC_ALL=$locale sed< authors.c> /dev/null ]] ; then >> echo $locale >> exit >> fi >> done >> EOF >> >> This should print a locale that can process the UTF-8 file. It needs >> cleaning up a bit, but that is the basis of it. >> > Thanks Reece. > > Charles: You want to do this? I'm on it.
If you have a patch ready, though, go for it. Chip