Re: [HACKERS] SP-GiST micro-optimizations

2012-08-29 Thread Heikki Linnakangas
On 28.08.2012 22:50, Ants Aasma wrote: On Tue, Aug 28, 2012 at 9:42 PM, Tom Lanet...@sss.pgh.pa.us wrote: Seems like that's down to the CPU not doing rep stosq particularly quickly, which might well be chip-specific. AMD optimization manual[1] states the following: For repeat counts of

[HACKERS] SP-GiST micro-optimizations

2012-08-28 Thread Heikki Linnakangas
I did some performance testing of building an SP-GiST index, with the new range type SP-GiST opclass. There's some low-hanging fruit there, I was able to reduce the index build time on a simple test case by about 20% with a few small changes. I created a test table with: create table

Re: [HACKERS] SP-GiST micro-optimizations

2012-08-28 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Drilling into the profile, I came up with three little optimizations: 1. Within spgdoinsert, a significant portion of the CPU time is spent on line 2033 in spgdoinsert.c: memset(out, 0, sizeof(out)); That zeroes out a small

Re: [HACKERS] SP-GiST micro-optimizations

2012-08-28 Thread Heikki Linnakangas
On 28.08.2012 20:30, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Drilling into the profile, I came up with three little optimizations: 1. Within spgdoinsert, a significant portion of the CPU time is spent on line 2033 in spgdoinsert.c: memset(out, 0,

Re: [HACKERS] SP-GiST micro-optimizations

2012-08-28 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 28.08.2012 20:30, Tom Lane wrote: Fascinating. I'd been of the opinion that modern compilers would inline memset() for themselves and MemSet was probably not better than what the compiler could do these days. What platform are

Re: [HACKERS] SP-GiST micro-optimizations

2012-08-28 Thread Ants Aasma
On Tue, Aug 28, 2012 at 9:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: Seems like that's down to the CPU not doing rep stosq particularly quickly, which might well be chip-specific. AMD optimization manual[1] states the following: For repeat counts of less than 4k, expand REP string