The performance hit of -i hasn't changed with 12.04 LTS.  Will have to
check with a newer grep, I guess.  Seeing e.g. 25 secs to grep -i on the
.c/.h files in a Linux source tree, 0.5 secs to grep without -i.  1.3
secs for a LANG=C grep -i.  No disk I/O, files are cached.

  So a factor of about 20 slowdown for en_CA.utf8 vs. POSIX case
insensitive grepping.

 Ubuntu 12.04 does set LANG=en_CA.utf8, and /usr/lib/locale now just
contains locale-archive.  So I'm not seeing any system calls trying to
open non-existant files like ahendry was.

 Again, haven't yet tried with the most recent ubuntu.  This should be
trivially easy for most people to test, as it doesn't require grep to
actually match anything.  (I still used the volatile s3tc pattern from
my original report when searching the Linux tree).  You just need a new
version of grep, and locale support for a utf8 English locale (e.g.
en_US.utf8).

 just run these 3 commands:
time find -name '*.[ch]' | xargs grep -i 'volatile.*s3tc'
time find -name '*.[ch]' | xargs grep 'volatile.*s3tc'
time find -name '*.[ch]' | LANG=C xargs grep -i 'volatile.*s3tc'

 If the LANG=C version isn't much faster than the grep -i with your
default locale (and/or LANG=en_US.utf8 if your default for some reason
isn't slow), then the problem is fixed and grep has fast case-
insensitive utf8 matching.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/75695

Title:
  huge performance hit for -i with UTF-8 locales

To manage notifications about this bug go to:
https://bugs.launchpad.net/grep/+bug/75695/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to