Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-17 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > Can this improvement get merged up into CVS current, or did you already > do that Tom? It's irrelevant to current. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-17 Thread Bruce Momjian
Can this improvement get merged up into CVS current, or did you already do that Tom? --- Tatsuo Ishii wrote: > > Nice work, Tatsuo! Wade, can you confirm that this patch solves your > > problem? > > > > Tatsuo, please commi

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-06 Thread Hannu Krosing
Tom Lane kirjutas K, 05.02.2003 kell 08:12: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > Another idea is to make special regex type and store the regexes > > pre-parsed (i.e. in some fast-load form) ? > > Seems unlikely that going out to disk could beat just recompiling the > regexp. We have

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-05 Thread Tatsuo Ishii
> Nice work, Tatsuo! Wade, can you confirm that this patch solves your > problem? > > Tatsuo, please commit into REL7_3 branch only --- I'm nearly ready to do > a wholesale replacement of the regex code in HEAD, so you wouldn't > accomplish much except to create a merge problem for me ... Ok. I h

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-05 Thread wade
Confirmed. Looks like a 100-fold increase. Thanx guys. Explain output can be seen here: http://arch.wavefire.com/pgregex.txt -Wade Klaver At 09:59 AM 2/5/03 -0500, Tom Lane wrote: >Tatsuo Ishii <[EMAIL PROTECTED]> writes: >> Ok. The original complain can be sasily solved at least for single >>

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-05 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Ok. The original complain can be sasily solved at least for single > byte encoding databases. With the small patches(against 7.3.1) > included, I got following result. Nice work, Tatsuo! Wade, can you confirm that this patch solves your problem? Tatsuo,

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > Another idea is to make special regex type and store the regexes > pre-parsed (i.e. in some fast-load form) ? Seems unlikely that going out to disk could beat just recompiling the regexp. They're not *that* slow to compile ... at least not when we avoid

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
Tom Lane kirjutas K, 05.02.2003 kell 01:35: > Neil Conway <[EMAIL PROTECTED]> writes: > > Speaking of which, is there (or should there be) some mechanism for > > increasing the size of the compiled pattern cache? Perhaps a GUC var? > > I thought about that while I was messing with the code, but I

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tatsuo Ishii
Ok. The original complain can be sasily solved at least for single byte encoding databases. With the small patches(against 7.3.1) included, I got following result. test1: select count(*) from tenk1 where 'quotidian' ~ string4; count --- 0 (1 row) Time: 113.81 ms test2: select count(*)

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes: > Speaking of which, is there (or should there be) some mechanism for > increasing the size of the compiled pattern cache? Perhaps a GUC var? I thought about that while I was messing with the code, but I don't think there's much point in it, unless someone w

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 17:26, Tom Lane wrote: > Proof of concept: > [...] Very cool work, Tom. > In the first case there are only four distinct patterns used, so we're > running with cached precompiled regexes. In the other cases a new regex > compilation must occur at each row. Speaking of whic

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > Tom Lane kirjutas T, 04.02.2003 kell 21:18: >> What advantages does it have to make it worth considering? > Should be the same as pcre + support for wide chars. Well, if someone wants to do the legwork to try it, that interface should work just about co

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Proof of concept: PG 7.3 using regression database: regression=# select count(*) from tenk1 where 'quotidian' ~ string4; count --- 0 (1 row) Time: 676.14 ms regression=# select count(*) from tenk1 where 'quotidian' ~ stringu1; count --- 0 (1 row) Time: 3426.96 ms regression=

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
Tom Lane kirjutas T, 04.02.2003 kell 21:18: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > If we are going into code-lifting business, we should also consider > > Pythons sre > > What advantages does it have to make it worth considering? Should be the same as pcre + support for wide chars. --

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 13:21, Tom Lane wrote: > After some further research, pcre does seem like an interesting > alternative. Both pcre and Spencer's new code have essentially > Berkeley-style licenses, so there's no problem there. Keep in mind that pcre has an advertising clause in its license (

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > If we are going into code-lifting business, we should also consider > Pythons sre What advantages does it have to make it worth considering? regards, tom lane ---(end of broadcast)

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
On Tue, 2003-02-04 at 18:21, Tom Lane wrote: > 4. pcre looks like it's probably *not* as well suited to a multibyte > environment. In particular, I doubt that its UTF8 compile option was > even turned on for the performance comparison Neil cited --- and the man > page only promises "experimental,

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Sean Chittenden
> > It would be a delight to be able to use more advanced (IMHO) Perl- > > compatible regexes in PostgreSQL. > > After some further research, pcre does seem like an interesting > alternative. Both pcre and Spencer's new code have essentially > Berkeley-style licenses, so there's no problem there.

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Jon Jensen <[EMAIL PROTECTED]> writes: > It would be a delight to be able to use more advanced (IMHO) Perl- > compatible regexes in PostgreSQL. After some further research, pcre does seem like an interesting alternative. Both pcre and Spencer's new code have essentially Berkeley-style licenses, s

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Jon Jensen
On Tue, 4 Feb 2003, Neil Conway wrote: > Spencer's implementation is outperformed by some other RE engines, > notably PCRE (www.pcre.org). But switching to another engine might > impose backward-compatibility problems, in terms of the details of the > RE syntax. It would be a delight to be able t

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes: > Sounds like we had about the same idea at about the same time -- I > emailed Henry Spencer inquiring about the new RE engine last night. I just did that this morning ;-) ... but more as politeness than anything else. AFAICT from searching the net, packagi

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
On Tue, 2003-02-04 at 17:15, Neil Conway wrote: > On Tue, 2003-02-04 at 11:59, Tom Lane wrote: > > I'm about to go off and look at whether we can absorb the Tcl regex > > package, which is Spencer's new baby. That will not be a solution for > > 7.3.anything, but it could be an answer for 7.4. > >

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
On Tue, 2003-02-04 at 16:59, Tom Lane wrote: > Neil Conway <[EMAIL PROTECTED]> writes: > > Given that this problem isn't a regression, I don't think we need to > > delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential, > > IMHO). > > No, I've had to abandon my original thought tha

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 11:59, Tom Lane wrote: > I'm about to go off and look at whether we can absorb the Tcl regex > package, which is Spencer's new baby. That will not be a solution for > 7.3.anything, but it could be an answer for 7.4. Sounds like we had about the same idea at about the same ti

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes: > Given that this problem isn't a regression, I don't think we need to > delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential, > IMHO). No, I've had to abandon my original thought that it was a localized bug, so it's not going to be fixed i

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 11:24, wade wrote: > I redid my trials with the same data set on 7.2.3 --with-multibyte and I > get the same brutal performance hit, so it is definitely a > multibyte-specific problem. Given that this problem isn't a regression, I don't think we need to delay 7.3.2 to fix i

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
wade <[EMAIL PROTECTED]> writes: > I redid my trials with the same data set on 7.2.3 --with-multibyte and I > get the same brutal performance hit, so it is definitely a > multibyte-specific problem. > > There are only about 1000 words that appear more than once (2 or 3 times) > in 27k rows. Righ

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread wade
OK, I redid my trials with the same data set on 7.2.3 --with-multibyte and I get the same brutal performance hit, so it is definitely a multibyte-specific problem. WRT the distribution of the data in the table, I used the following: All g-words in /usr/share/dict with different processes attac

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Next question: may I guess that you weren't using MULTIBYTE in 7.2? After still more digging, I'm coming round to the opinion that the problem is that MULTIBYTE is forced on in 7.3, and this imposes a factor-of-256 overhead in a bunch of the operations in regcomp.c. In particular, compiling a case

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Wade, how many distinct patterns do you have in that table? What's the population distribution (in particular, do the top 32 patterns account for most of the table)? It's looking like the issue is not so much that the 7.3 code is completely broken, as that its LRU replacement policy for precompil

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
Well, IMHO I would rather see a delay of the roll-out by a day or two than see a release with such a serious performance glitch. Especially since I personally have been shooting my big mouth off to all my geek friends on the leaps and bounds PG has made in the last few releases. With my luck on

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Sigh. It seems that somebody broke caching of compiled regexes, so that your regex is recompiled each time it's used. I haven't dug into the logic yet, but I think it must have been a mistake in Thomas' change to make the regex cache be searched circularly: 2002-06-14 22:49 thomas * sr

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 05:51 PM 2/3/03 -0500, Tom Lane wrote: >wade <[EMAIL PROTECTED]> writes: >> Here is the profile information. I included a log of the session that >> generated it at the top of the gprof output. If there is any other info I >> can help you with, please let me know. > >A four-second test isn't

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
wade <[EMAIL PROTECTED]> writes: > Here is the profile information. I included a log of the session that > generated it at the top of the gprof output. If there is any other info I > can help you with, please let me know. A four-second test isn't long enough to gather any statistically meaningf

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 10:52 PM 1/31/03 -0500, Tom Lane wrote: >wade <[EMAIL PROTECTED]> writes: >> We recently upgraded a project from 7.2 to 7.3.1 to make use of some of >> the cool new features in 7.3. The installed version is CVS stable from >> yesterday. However, we noticed a major performance hit in POSIX re

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 08:31 PM 2/1/03 +0800, Christopher Kings-Lynne wrote: >Why on earth are you using a CVS version!?!?!?! > >Chris > This problem manifests itself under 7.3.1 release as well. CVS is used so we can access patches to the SRF stuff implemented after 7.3.1 was released. Tom... any links that documen

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-01 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: > Why on earth are you using a CVS version!?!?!?! I assume he meant tip of REL7_3 branch --- which is a perfectly reasonable thing to install, even if there are still a few fixes to go before we call it 7.3.2. regards, to

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-01 Thread Christopher Kings-Lynne
Why on earth are you using a CVS version!?!?!?! Chris On Fri, 31 Jan 2003, wade wrote: > Hello, > We recently upgraded a project from 7.2 to 7.3.1 to make use of some of > the cool new features in 7.3. The installed version is CVS stable from > yesterday. However, we noticed a major performa

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-01-31 Thread Tom Lane
wade <[EMAIL PROTECTED]> writes: > We recently upgraded a project from 7.2 to 7.3.1 to make use of some of > the cool new features in 7.3. The installed version is CVS stable from > yesterday. However, we noticed a major performance hit in POSIX regular > expression matches against columns usin

[HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-01-31 Thread wade
Hello, We recently upgraded a project from 7.2 to 7.3.1 to make use of some of the cool new features in 7.3. The installed version is CVS stable from yesterday. However, we noticed a major performance hit in POSIX regular expression matches against columns using the ~* operator. http://arch.w