On Mon, 9 Sep 2002 04:12, Chris Little wrote: > On Sun, 8 Sep 2002, Jerry Hastings wrote: >> At 12:48 AM 9/9/2002 +0800, Leon Brooks wrote: >>> All verses containing two or more of God, Good or Greed: (g[ore]*d){2,}
>> I don't believe that gives the desired result. At least not in BibleCS. > FWIW, we need to upgrade our regexp engine. True. > First it is GPL--this is > the last GPL component in the library. If it were replaced with something > else, we could license Sword under non-GPL licenses to other entities > (e.g. Bible societies that don't want to deal with GPL's restrictions) or > put it out publicly under a license that we write that better meets our > needs than the GPL. For Bible Societies, at least, I would have thought that the GPL would be the perfect licence. This is predicated on the expectation that the Societys' primary goal is dissemination of the word. > Second (and probably more immediately important) it > doesn't handle UTF-8. > Perl Regexp fixes both of these problems. There is Rx - http://ftp.gnu.org/pub/gnu/rx/rx-1.5.tar.gz - which fixes the parenthesis problem - such as it is - but doesn't mention UTF-8. I regard the GPL as a significant feature, not a problem. The archive contains the following interesting quote: begin quoted text The Regexp Library Cook-off Rx is, among other things, an implementation of the interface specified by POSIX for programming with regular expressions. Some other implementations are GNU regex.c and Henry Spencer's regex library. If you are maintaining a program or library that includes a regexp matcher, you might want to consider which regexp implementation is best. Regexp matchers are very complicated; they are hard to get right, hard to make fast and efficient, and hard to evaluate. Therefore, choosing the best implementation for your needs is no easy task; neither is maintaining an implementation. To my knowledge, there are no comprehensive, free-software test suites to help you evaluate regex function implementations. This release of Rx includes some tests to try to help fix that. The release includes test programs which you can use to measure some aspects of the correctness and performance of your favorite POSIX regexp library. If you use these, please consider adding new tests to the collection and sending them to the author of Rx. end quoted text Henry Spencer's regex, mentioned therein, is at http://arglist.com/regex/ and includes a list of other libraries and resources. > If there are other quirks in the GNU Regexp implementation like you > mention, we can pray that Perl Regexp fixes those also. Umm... given that they're both Open Source, no matter which library one chooses, one is able to follow through considerably on one's own prayers. One positive consideration of the licencing for the PERL regex is that it doesn't preclude switchiung to GPL later. Cheers; Leon