Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-15 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Andres Freund and...@anarazel.de wrote: [concerns addressed in new patch version] Looks good to me. I'm marking this Ready for Committer. Applied with minor editorialization --- improving the comments mostly.

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-10 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: I think you see no real benefit, because your strings are rather short - the documents I scanned when noticing the issue where rather long. The document I used in the test which showed the regression was 672,585 characters, containing 10,000 URLs.

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-10 Thread Kevin Grittner
I wrote: Thanks for the sample which shows the difference. Ah, once I got on the right track, there is no problem seeing dramatic improvements with the patch. It changes some nasty O(N^2) cases to O(N). In particular, the fixes affect parsing of large strings encoded with multi-byte

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-10 Thread Kevin Grittner
I wrote: I did find one significant (although easily solved) problem. In the patch, the recursive setup of usewide, pgwstr, and wstr are not conditioned by #ifdef USE_WIDE_UPPER_LOWER in the non-patched version. Unless there's a good reason for that, the #ifdef should be added. That

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-10 Thread Andres Freund
On Thursday 10 December 2009 19:10:24 Kevin Grittner wrote: I wrote: Thanks for the sample which shows the difference. Ah, once I got on the right track, there is no problem seeing dramatic improvements with the patch. It changes some nasty O(N^2) cases to O(N). In particular, the fixes

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-10 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: [concerns addressed in new patch version] Looks good to me. I'm marking this Ready for Committer. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-09 Thread Andres Freund
On Tuesday 08 December 2009 17:15:36 Kevin Grittner wrote: Andres Freund and...@anarazel.de wrote: Could you show your testcase? OK. I was going to try to check other platforms first, and package up the information better, but here goes. I created 1 lines with random IP-based URLs

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
I wrote: Frankly, I'd be amazed if there was a performance regression, OK, I'm amazed. While it apparently helps some cases dramatically (Andres had a case where run time was reduced by 93.2%), I found a pretty routine case where run time was increased by 3.1%. I tweaked the code and got

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Andres Freund
On Tuesday 08 December 2009 16:23:11 Kevin Grittner wrote: I wrote: Frankly, I'd be amazed if there was a performance regression, OK, I'm amazed. While it apparently helps some cases dramatically (Andres had a case where run time was reduced by 93.2%), I found a pretty routine case where

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: Could you show your testcase? OK. I was going to try to check other platforms first, and package up the information better, but here goes. I created 1 lines with random IP-based URLs for a test. The first few lines are: create table t1 (c1

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Andres Freund
On Tuesday 08 December 2009 17:15:36 Kevin Grittner wrote: Andres Freund and...@anarazel.de wrote: Could you show your testcase? Will hopefully look into this later. I dont see why it could get slower? I don't either. The best I can tell, following the pointer from orig to any of its

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: Did you test that with a optimized build? After running a series of tests with --enable-cassert to confirm that nothing squawked, I disabled that before doing any performance testing. Going from memory, I used --enable-debug --with-libxml and --prefix.

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Greg Smith
Kevin Grittner wrote: After running a series of tests with --enable-cassert to confirm that nothing squawked, I disabled that before doing any performance testing. Going from memory, I used --enable-debug --with-libxml and --prefix. I didn't explicitly use or disable any compiler

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote: Now that you ask, of course I just spotted a bug in there such that the documented behavior for the PGDEBUG feature doesn't actually work. If you were using that to turn off asserts, that may not have worked as expected. Don't know what you did there

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Andres Freund
Hi, On Tuesday 08 December 2009 20:09:22 Greg Smith wrote: Andres, are using any optimization flags when you're testing? I was testing with and without debug/cassert - and did not get adverse results in both... Unfortunately it looks like I wont get to test today, but only tomorrow morning

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Greg Smith
Kevin Grittner wrote: Anyway, I'm not sure whether your reply directly answers the point I was raising -- peg doesn't do anything with the compiler optimization flags under the covers, does it? Not really. It does this: PGDEBUG=--enable-cassert --enable-debug ./configure

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: Kevin Grittner wrote: Anyway, I'm not sure whether your reply directly answers the point I was raising -- peg doesn't do anything with the compiler optimization flags under the covers, does it? Not really. It does this: PGDEBUG=--enable-cassert

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Greg Smith
Tom Lane wrote: --enable-cassert might have a noticeable performance impact. We usually try to not have that on when doing performance testing. All covered in the tool's documentation, and it looks like Kevin did the right thing during his tests by checking the pg_config output to confirm

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: --enable-cassert might have a noticeable performance impact. We usually try to not have that on when doing performance testing. Right. I turned it on for some initial tests to confirm that we had no assertion failures; then turned it off for the

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-08 Thread Kevin Grittner
I wrote: Perhaps it is some quirk of using 32 bit pointers on the 64 bit AMD CPU? (I'm looking forward to testing this today on a 64 bit build on an Intel CPU.) The exact same test on 64 bit OS (SuSE Enterprise Server) on Intel gave very different results. With 10 runs each of 200

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-07 Thread Andres Freund
On Monday 07 December 2009 02:12:43 Greg Smith wrote: After getting off to a good start, it looks like this patch is now stuck waiting for a second review pass from Kevin right now, with no open items for Andres to correct. Since the only issues on the table seem to be that of code aesthetics

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-07 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote: After getting off to a good start, it looks like this patch is now stuck waiting for a second review pass from Kevin right now, with no open items for Andres to correct. Since the only issues on the table seem to be that of code aesthetics and

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-12-06 Thread Greg Smith
After getting off to a good start, it looks like this patch is now stuck waiting for a second review pass from Kevin right now, with no open items for Andres to correct. Since the only issues on the table seem to be that of code aesthetics and long-term planning for this style of

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-25 Thread Andres Freund
On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote: Andres Freund wrote: On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote: It is in context format, applies cleanly, and passes make check. Unfortunately the latter is not saying much - I had a bug there and it was not

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-25 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote: Andres Freund wrote: I had a bug there and it was not found by the regression tests. Perhaps I should take a stab and add at least some more... Sounds like a good idea. Hm. There

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-14 Thread Kevin Grittner
Andres Freund wrote: On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote: It is in context format, applies cleanly, and passes make check. Unfortunately the latter is not saying much - I had a bug there and it was not found by the regression tests. Perhaps I should take a stab and add

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-13 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote: On Sunday 01 November 2009 16:19:43 Andres Freund wrote: While playing around/evaluating tsearch I notices that to_tsvector is obscenely slow for some files. After some profiling I found that this is due using a seperate TSParser in p_ishost/p_isURLPath

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-13 Thread Alvaro Herrera
Kevin Grittner wrote: (Note: I personally would much rather see the performance penalty addressed this way, and a TODO added for the more invasive work, than to leave this alone for the next release if there's nobody willing to tackle the problem at a more fundamental level.) +1 -- Alvaro

Re: [HACKERS] tsearch parser inefficiency if text includes urls or emails - new version

2009-11-13 Thread Andres Freund
On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote: It is in context format, applies cleanly, and passes make check. Unfortunately the latter is not saying much - I had a bug there and it was not found by the regression tests. Perhaps I should take a stab and add at least some more...