(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
v4: roll into one patch, use macros
---
src/interfaces
Hi Tom,
That code is not broken as it stands, and doesn't appear to really
gain anything from the proposed change. Why should we risk any
portability questions when the code isn't going to get either simpler
or shorter?
OK, after attempting a macro version of this, we end up with:
#define
Hi Greg,
Thanks for the benchmark app, thought I'd pitch in with some ppc
results:
clz 1.530s
lookup table 1.720s
float hack 4.424s
unrolled 5.280s
normal 5.369s
POWER5+:
clz 2.046s
lookup table 2.188s
Hi Robert,
Perhaps we should use macros.
I was trying to avoid macros, as this means we lose type- and syntax-
checking at the call-site, and end up with slightly messier code.
However, I understand that this is probably personal preference for me
:)
How about just 'static' functions? (ie,
Hi Robert,
That having been said, Jeremy, you probably want to take a look at
those comments and I have a few responses to them as well.
OK, thanks for the heads-up.
following comment:
Applied and built cleanly. Regress passes. Trying to hunt down ppc
box to see if performance
Tom,
However, I think the whole patch is pretty useless. That code is not
broken as it stands, and doesn't appear to really gain anything from
the proposed change. Why should we risk any portability questions
when the code isn't going to get either simpler or shorter?
This patch clears the
Rather than testing single bits in a loop, change AllocSetFreeIndex to
use the __builtin_clz() function to calculate the chunk index.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on sysbench on PowerPC.
Signed-off-by: Jeremy Kerr j
Hi, Alvaro,
Does this work in compilers other than GCC? I think we use some
kludges to protect against them ... see pg_list.h for the canonical
example.
As I understand it, we're not using static inlines in pg_list.h to
prevent multiple objects from exporting the same symbols if the
testing though, is there a machine that allows both
SO_NOSIGPIPE and MSG_NOSIGNAL?
v3: respin as context diffs
Again, comments most welcome,
Jeremy
---
Jeremy Kerr (2):
[libpq] rework sigpipe-handling macros
[libpq] Try to avoid manually masking SIGPIPEs on every send()
--
Sent
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/interfaces/libpq/fe-connect.c | 40
change the code to be implemented as static inlines.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/interfaces/libpq/fe-secure.c | 88 ---
1 file changed, 55 insertions(+), 33 deletions(-)
*** a/src/interfaces/libpq/fe-secure.c
--- b/src/interfaces/libpq/fe
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
v3: respin
Stephen,
Is it just the size that matters, or is it when there are few spaces
at the end?
It's the number of spaces at the end. If we knew this number, then we
wouldn't have to do any comparisons at all :)
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list
Stephen,
If the updated function is always faster when the overall string is at
least, say, 16 characters long,
But that's not the case - the cost of the function (and the speedup from
the previous version) depends on the number of spaces that there are at
the end.
For the new function to
Hi Stephen,
What would be really useful would be best case and worst case
scenarios.
I've put together some data from a microbenchmark of the bcTrulen
function, patched and unpatched.
As for best-case, if you have a long string of trailing spaces, we can
go through them at theoretically one
Tom,
I've put together some data from a microbenchmark of the bcTrulen
function, patched and unpatched.
Uh, where's the data?
If you're after the raw data for a run, I've put it up:
http://ozlabs.org/~jk/projects/db/data/bctruelen.csv
I've also packaged up the quick-and-dirty benchmark
the number of compares where possible. Word-size compares
are performed on aligned sections of the string.
Results in a small performance increase; around 1-2% on my POWER6 test
box.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
Resend: context diff this time
---
src/backend/utils/adt/varchar.c | 24
Robert,
I'd still like to know the workload and exact numbers.
From up-thread:
http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
- or were there other details you were looking for?
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make
Hi all,
Speaking of which, what about some performance numbers?
OK, benchmarks done:
http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
Summary: small increase in performance (~1-2% on my machine), at about
1.5 standard deviations from the mean. Profiles show a decent drop in
hits
Hi all,
That's correct. To check the alignment you would have to look at the
actual pointer. I would suggest using the existing macros to handle
alignment. Hm, though the only one I see offhand which is relevant is
the moderately silly PointerIsAligned(). Still it would make the code
clearer
Hi Tom,
Speaking of which, what about some performance numbers? Like Heikki,
I'm quite suspicious of whether there is any real-world gain to be
had from this approach.
Will send numbers tomorrow, with the reworked patch.
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list
Alvaro,
Maybe bcTruelen could be optimized to step on one word at a time
(perhaps by using XOR against a precomputed word filled with ' '),
instead of one byte at a time ...
I have a patch for this, will send soon.
Regards,
Jeremy
--
Sent via pgsql-hackers mailing list
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/backend/utils/adt/varchar.c | 24 +---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 5f3c658..6889dff 100644
--- a/src/backend/utils
Robert,
This looks very non-portable to me.
Unsurprisingly, I'm new to postgres hacking and the large number of
supported platforms :)
I was considering something like:
unsigned int spaces;
const unsigned int wordsize = sizeof(unsigned int);
memset(spaces, ' ',
testing though, is there a machine that allows both
SO_NOSIGPIPE and MSG_NOSIGNAL?
Again, comments most welcome,
Jeremy
---
Jeremy Kerr (2):
[libpq] rework sigpipe-handling macros
[libpq] Try to avoid manually masking SIGPIPEs on every send()
--
Sent via pgsql-hackers mailing
change the code to be implemented as static inlines.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/interfaces/libpq/fe-secure.c | 88 ---
1 file changed, 55 insertions(+), 33 deletions(-)
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/interfaces/libpq/fe-connect.c | 39
Marko,
optval seems used without initialization.
Dang, I was checking for the SO_NOSIGPIPE flag, and didn't check for
optval. New patch coming...
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
v2: initialise optval
---
src/interfaces/libpq/fe
Hi Atsushi,
If size = 8, fls((size - 1) ALLOC_MINBITS) is fls(0).
The result of fls(0) is undefined.
Yep, got caught out by this because my previous fls() supported zero.
I think we have to never call fls(0) from AllocSetFreeIndex().
My proposal code:
if (size (1 ALLOC_MINBITS))
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
v2: prevent fls(0
Hi Atsushi,
In my benchmark program, it is a little different performance
in fls implementation and inline asm implementation.
However, the result of a pgbench is almost the same improvement.
Here is the result of my benchmark.
Excellent, thank you for getting this extra set of numbers.
Florian,
+#if defined(__GNUC__) \
+ (defined(__ppc__) || defined(__powerpc__) || \
+defined(__ppc64__) || defined (__powerpc64__))
If you require GCC anyway, you can use __builtin_clz instead.
(It's been available since GCC 4.1 at least.)
Because now we have to test the compiler
Hi Tom,
The other thing I didn't like about the patch was the assumption that
it's okay to have a static inline function in a header. You can
get away with that in gcc but *not* in other compilers.
Gee, you user-space guys have it tough! :D
Point taken, will rework.
Look at the existing
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
configure.in
Tom,
The consideration is that the application fails completely on server
disconnect (because it gets SIGPIPE'd). This was long ago deemed
unacceptable, and we aren't likely to change our opinion on that.
OK, understood. I'm guessing MSG_NOSIGNAL on the send() isn't portable
enough here?
Tom,
A feature that is exercised via setsockopt is probably fairly safe,
since you can check for failure of the setsockopt call and then do
it the old way. MSG_NOSIGNAL is a recv() flag, no?
It's a flag to send().
The question is whether you could expect that the recv() would fail if
it
Hi,
I made a faster version of AllocSetFreeIndex for x86 architecture.
Neat, I have a version for PowerPC too.
In order to prevent writing multiple copies of AllocSetFreeIndex, I
propose that we add a fls() function (find last set); this can be
defined in an architecture-independent manner
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/include/utils/bitops.h | 52 +
1 file changed, 52 insertions(+)
diff
Results in a ~2% performance increase by using the powerpc fls()
implementation.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/backend/utils/mmgr/aset.c |8 ++--
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
v2: only use inline asm with gcc
---
src/include/utils/bitops.h | 53 +
1
If the connection isn't over SSL, there's no need to do the disable.
This effectively halves the number of syscalls performed by libpq when
SSL is not in use.
Signed-off-by: Jeremy Kerr j...@ozlabs.org
---
src/interfaces/libpq/fe-secure.c |7 +++
1 file changed, 3 insertions(+), 4
the following performance improvement
during a sysbench OLTP run:
http://ozlabs.org/~jk/projects/db/data/sigpipe-perf.png
load: sysbench --test=oltp --oltp-read-only=on, connecting locally,
machine: POWER6, 64-way, 4.2GHz
Comments most welcome,
Jeremy
---
Jeremy Kerr (1):
Only disable sigpipe
43 matches
Mail list logo