On Wed, Dec 19, 2007 at 12:34:02AM +1100, Rick Welykochy wrote: > >No sir! But shell usually wins. > > On my 1 GHz / 1 GB powerbook, the python one-liner > I just submitted runs 5 x faster than the original.
I think C usually wins, the version below is 25 times faster than the python version (from disk cache). [EMAIL PROTECTED]:~$ ls -lh /tmp/randomcommas -rw-r--r-- 1 ianw ianw 65M 2007-12-19 14:30 /tmp/randomcommas [EMAIL PROTECTED]:~$ /usr/bin/time ./comma < /tmp/randomcommas commas: 1287100 0.07user 0.04system 0:00.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+126minor)pagefaults 0swaps [EMAIL PROTECTED]:~$ /usr/bin/time python -Sc "import sys; print sum(l.count(',') for l in sys.stdin)" < /tmp/randomcommas 1287100 2.68user 0.13system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+8659minor)pagefaults 0swaps I'd guess the Python version is spending that time doing some extra copying because it causes a lot of page faults is really cache unfriendly. Python Instructions retired per L1 data cache access: 11.03 Instructions retired per L2 data cache access: 24.16 C Instructions retired per L1 data cache access: 6.01 Instructions retired per L2 data cache access: 366.92 -i #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <unistd.h> #define CHUNK 16384 char buf[CHUNK]; int main(int argc, char *argv[]) { unsigned long count = 0; ssize_t len; int fd = 0; if (argc != 1) fd = open(argv[1], O_RDONLY); if (fd == -1) { printf("blah: %s\n", strerror(errno)); exit(-1); } while ( (len = read(fd, buf, CHUNK)) != 0 ) { int i; for (i=0; i < len; i++) if (buf[i] == ',') count++; } printf("commas: %lu\n", count); return 0; } -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html