On 11/9/10 1:58 PM, James Mckenzie wrote: > Charles Davis <cda...@mymail.mines.edu> wrote: >> >> On 11/9/10 12:13 PM, James Mckenzie wrote: >>> No, it is not a bug in GNU sed. The authors.c file needs to have the >>> erroneous characters for the language used by >>> MacOSX changed to be acceptable? >> That ain't gonna fly. I think we should explicitly use a UTF-8 locale >> (like en_US.UTF-8 or some such) instead of the C locale when sed goes >> over the AUTHORS file. > > Don't shoot the messenger. Sorry.
The problem with your first idea--removing the bad characters directly from the authors.c file--is that we'd need to use a utility like sed or awk to implement it automatically--which puts us right back where we started. (We could use diff/patch, but is it worth the effort to maintain a patch for this? And would AJ let us put the patch file in Wine? And if not, where would we put it?) > Maybe we can force the use of sed if it exists in the /usr/bin directory > then to get around the 'brokenness' of GNU sed on the Mac? Maybe. But that seems like a hack. A better way might be to detect if we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed. That's less of a hack, but still a hack. > If not, it is a real bear to set the language on a Mac per previous > discussions on the Users list. That was about setting LANG. Wine always obeys LC_*, and so does sed. It's not the language that's the problem. It's the encoding. The AUTHORS file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told it not to (i.e. we told it to use MacRoman because that's the default encoding for the C locale). If we tell it to use UTF-8 (by setting LC_ALL to, for example, 'en_US.UTF-8'), it will process the file correctly. Unfortunately, I just remembered that the name of the UTF-8 encoding is different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us from setting LC_ALL differently. We might end up having to hack around this the way either you or I described. Chip