Most of us know that, way back when, and Ken Thompson still had a black beard, 
that there were three basic version of "grep", with prefixed or flags that 
turned them on a such, but not fully integrated version of this tool that 
would work as quickly as the three versions.  This, I think, went by the 
board, sometime ago, what with faster processors, DFA-type algorithms and the 
like.  Now we seem to have mostly one, copied into it's various destinations 
by the squanders, or symlinked by the thrifty.  What the hell!  It's all 
gotten so much bigger and faster, so why bother: the toolbox approach was 
alright for tradesman, who actually had toolboxes, but for the rest....

I discovered this, a decade or so ago, when an out-of-the-box distribution ran 
(very signifcantly more slowly) that equivalent pattern-matchers in "awk" 
and "perl".  The problem was easy enough to fix, it just involved resetting 
the "$LANG" variable in the shell to "C" or "POSIX".  The current "en_US" 
setting produces a much more attenuated problem of the one described above, 
and isn't worth worrying about unless, as I do (I'm a linguist) you 
use "*grep" repetetively, where it surges once more into prominence.  The 
actual culprit is the "as-shipped `fgrep', which has a very curious 
conception of what a word is, unless it is operating in the right locale.  I 
haven't bothered to localize this exactly, but I know from "strace" that many 
processes do a fair bit of locale-checking on their way to execution.  Given 
that English as a mother-tongue is the fourth-most spoken language on the 
planet, and as a second (and, in many case, semi-bilingual setting) is spoken 
by more than 1 billion people, a great many of whom do not speak or write 
American dialects of English, maybe the developers of "*grep" should take 
this into account.

I personally solved the problem by replacing my sym-linked "fgrep" with a 
far-older (yet fully functional) version.  Maybe I should forward this one as 
a "bugs" report, although it's been a bug for years.  Maybe we should all 
talk POSIX (I have certain professional doubts about that).  Search lists, 
the "-f" option, is not, I think, behaving nicely.

Cheers,
Malcolm Johhston
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to