Hi, Here is what I found re regex using test/* testsuite.
text data bss dec hex filename 41385 0 32 41417 a1c9 uClibc/libc/misc/regex/regex.os 20740 4 293 21037 522d uClibc/libc/misc/regex/regex_old.os "New" regex results testregex: TEST testregex basic, 533 tests, 6 errors TEST testregex categorize, 21 tests, 0 errors TEST testregex forcedassoc, 47 tests, 9 errors TEST testregex interpretation, 148 tests, 6 signals, 36 errors TEST testregex leftassoc, 16 tests, 8 errors TEST testregex nullsubexpr, 112 tests, 2 signals, 3 errors TEST testregex repetition, 81 tests, 5 errors TEST testregex rightassoc, 20 tests, 4 errors tst-regex2: test 0 pattern 0 '.?.?.?.?.?.?.?Log\.13' regexec without REG_NOSUB did not find the correct match test 0 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' regexec without REG_NOSUB did not find the correct match test 0 pattern 2 '((((((((((.?))))))))))... regexec without REG_NOSUB did not find the correct match test 1 pattern 0 '.?.?.?.?.?.?.?Log\.13' 0.109511s test 1 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' 0.110819s test 1 pattern 2 '((((((((((.?))))))))))... 0.111298s test 2 pattern 0 '.?.?.?.?.?.?.?Log\.13' re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) test 2 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) test 2 pattern 2 '((((((((((.?))))))))))... re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) test 3 pattern 0 '.?.?.?.?.?.?.?Log\.13' re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) test 3 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) test 3 pattern 2 '((((((((((.?))))))))))... re_search did not find the correct match(found 'angeLog.13 for earlier changes. ' instead) tst-regexloc: match from 0 to 9 [WRONG] "Old" regex results testregex: TEST testregex basic, 538 tests, 1 error TEST testregex categorize, 21 tests, 0 errors TEST testregex forcedassoc, 47 tests, 9 errors TEST testregex interpretation, 167 tests, 6 signals, 17 errors TEST testregex leftassoc, 16 tests, 8 errors TEST testregex nullsubexpr, 83 tests, 32 errors TEST testregex repetition, 79 tests, 8 errors TEST testregex rightassoc, 20 tests, 4 errors tst-regex2: test 0 pattern 0 '.?.?.?.?.?.?.?Log\.13' 1.312488s test 0 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' 3.084833s test 0 pattern 2 '((((((((((.?))))))))))... 19.417234s test 1 pattern 0 '.?.?.?.?.?.?.?Log\.13' 1.156149s test 1 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' 3.210494s test 1 pattern 2 '((((((((((.?))))))))))... 19.149326s test 2 pattern 0 '.?.?.?.?.?.?.?Log\.13' 1.094042s test 2 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' 3.047473s test 2 pattern 2 '((((((((((.?))))))))))... 19.800388s test 3 pattern 0 '.?.?.?.?.?.?.?Log\.13' 1.129805s test 3 pattern 1 '(.?)(.?)(.?)(.?)(.?)(.?)(.?)Log\.13' 4.097457s test 3 pattern 2 '((((((((((.?))))))))))... 18.404386s tst-regexloc: match from 0 to 6 [correct] Conclusion: so far new regex does not look like an improvement overall. It has more failures than old one, although there are cases where it works correctly and old one does not. It is also twice as big. Source of both regexp's does not look good. It is not very readable. Also, there is a lot of cruft: #ifdef emacs ? #if defined _AIX ?, some glibcy-tasting optimizations ("lets use alloca *and* malloc! that way the code will be twice as big, and twice as hard to debug!"). REGEX_OLD is slow. tst-regex2 built against glibc takes ~0.022499s per pattern test, while uclibc takes ~20 seconds (on pattern 3, see test). New regex works much faster, but it failed almost every test in tst-regex2! I guess we need to take a look at new regex and try to fix it. If it would prove too difficult, I vote for disabling it in config system. No point in giving people something which is more buggy and bigger. I also will try to soup up test infrastructure. So far it looks like it wasn't used much (lots of failures, problematic behavior of the testing infrastructure itself, inadequate docs). On Thursday 11 December 2008 16:51, Bernhard Reutner-Fischer wrote: > On Thu, Dec 11, 2008 at 02:17:53AM +0100, Denys Vlasenko wrote: > >On Tuesday 09 December 2008 22:00, Rob Landley wrote: > >> Right now, there are still two "old" linuxthreads branches in uClibc, and > >> as > >> far as I can tell we'll be supporting them in perpetuity. (For a > >> definition > >> of "support" that involves leaving them alone unless somebody complains.) > > > >We need to stop doing that, though. > > > >We have "ond" and "new" vfprintf, "old" and "new" regex, > >"old" and "new" threads, "old" and "new" fnmatch. > > > >Let's just decide on something, and disable things which are > >really "old". Because currently, I need telepathic powers > >to figure out which regex to choose: > > > >Sometimes "old" is actually old and better be dropped, > >other times "old" is actually "stable and recommended", > >and "new" is "work in progress, and maybe developer was hit > >by the bus, nobody knows". For one, I do not know > >what vfprintf or regex to choose, I use rand(). > > > >But users are even less likely than we to know what to choose. > >We, as developers, need to help them. > > > >It's not like disabling or even rm'ing one or the other > >is irreversible. It all will still be in a history. > > > >Just IMHO. Bernhard, I guess it's up to you to decide this. > > Let's leave the complete libpthread/* out of this particular discussion > for now. > > As for the rest, i currently don't have a strong opinion on which > of the old or new variants to keep, which malloc impl (of the 3) to > drop. I'm open to suggestions. > > Any preferences / thoughts? -- vda _______________________________________________ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc