Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote: > Ah. I was not thinking of touching pg_atomic_read_u32/u64_impl, > although now that you mention it, it's not clear to me why we > couldn't simplify > - return *(>value); > + return ptr->value; Just to check, I applied that change to pg_atomic_read_u32_impl and

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund writes: > On 2017-09-06 15:25:20 -0400, Tom Lane wrote: >> I think we can just use "old = ptr->value" to set up for the cmpxchg >> loop in every generic.h function that uses such a loop. > I think we might have been talking past each other - I thought you were >

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:25:20 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-09-06 15:12:13 -0400, Tom Lane wrote: > >> It looks to me like two of the three implementations promise no such > >> thing. > > > They're volatile vars, so why not? > > Yeah, but so are the

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund writes: > On 2017-09-06 15:12:13 -0400, Tom Lane wrote: >> It looks to me like two of the three implementations promise no such >> thing. > They're volatile vars, so why not? Yeah, but so are the caller's variables. That is, in

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:12:13 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-09-06 14:31:26 -0400, Tom Lane wrote: > >> However, if that's the reasoning, why don't we make all of these > >> use simple reads? It seems unlikely that a locked read is free. > > > We don't

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund writes: > On 2017-09-06 14:31:26 -0400, Tom Lane wrote: >> However, if that's the reasoning, why don't we make all of these >> use simple reads? It seems unlikely that a locked read is free. > We don't really use locked reads? All the _atomic_ wrapper forces is

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
Hi, On 2017-09-06 14:31:26 -0400, Tom Lane wrote: > I wrote: > > Anyway, I don't have a big objection to applying this. My concern > > is more that we need to be taking a harder look at other parts of > > the atomics infrastructure, because tweaks there are likely to buy > > much more. > > I

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote: > Anyway, I don't have a big objection to applying this. My concern > is more that we need to be taking a harder look at other parts of > the atomics infrastructure, because tweaks there are likely to buy > much more. I went ahead and pushed these patches, adding __sync_fetch_and_sub

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Jesper Pedersen
Hi Jeff, On 09/05/2017 03:47 PM, Jeff Janes wrote: I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using both logged and unlogged tables. Also ran an internal benchmark which didn't show anything either. What scale factor and client count? How many cores per socket? It

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Robert Haas writes: > It's not a question of whether the return value is used, but of > whether the updated value of *old is used. Right, but if we re-read "old" in the loop, and if the primitive doesn't return "old" (or does, but call site ignores it) then in principle

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 12:01 PM, Tom Lane wrote: >> This seems like a pretty sound argument to me. I think Tom's probably >> right that the changes in generic-gcc.h are the important ones, but >> I'm not sure that's an argument against patching generics.h. Given >> that

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Robert Haas writes: > On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura > wrote: >> But I think, generic version still should be "fixed". >> If generic version is not reached on any platform, then why it is kept? >> If it is reached somewhere, then

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura wrote: > Yes, you're right. > > But I think, generic version still should be "fixed". > If generic version is not reached on any platform, then why it is kept? > If it is reached somewhere, then it should be improved. This

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura
On 2017-09-06 16:36, Tom Lane wrote: Sokolov Yura writes: On 2017-09-06 15:56, Tom Lane wrote: The point I'm trying to make is that if tweaking generic.h improves performance then it's an indicator of missed cases in the less-generic atomics code, and the latter

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Sokolov Yura writes: > On 2017-09-06 15:56, Tom Lane wrote: >> The point I'm trying to make is that if tweaking generic.h improves >> performance then it's an indicator of missed cases in the less-generic >> atomics code, and the latter is where our attention should

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura
On 2017-09-06 15:56, Tom Lane wrote: Simon Riggs writes: On 5 September 2017 at 21:23, Tom Lane wrote: Moreover, it matters which primitive you're testing, on which platform, with which compiler, because we have a couple of layers of atomic ops

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Simon Riggs writes: > On 5 September 2017 at 21:23, Tom Lane wrote: >> Moreover, it matters which primitive you're testing, on which platform, >> with which compiler, because we have a couple of layers of atomic ops >> implementations. > If there is no

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Simon Riggs
On 5 September 2017 at 21:23, Tom Lane wrote: > Jeff Janes writes: >> What scale factor and client count? How many cores per socket? It looks >> like Sokolov was just starting to see gains at 200 clients on 72 cores, >> using -N transaction. ... >

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura
On 2017-09-06 14:54, Sokolov Yura wrote: On 2017-09-06 07:23, Tom Lane wrote: Jeff Janes writes: What scale factor and client count? How many cores per socket? It looks like Sokolov was just starting to see gains at 200 clients on 72 cores, using -N transaction. This

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura
On 2017-09-06 07:23, Tom Lane wrote: Jeff Janes writes: What scale factor and client count? How many cores per socket? It looks like Sokolov was just starting to see gains at 200 clients on 72 cores, using -N transaction. This means that Sokolov's proposed changes in

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Tom Lane
Jeff Janes writes: > What scale factor and client count? How many cores per socket? It looks > like Sokolov was just starting to see gains at 200 clients on 72 cores, > using -N transaction. I spent some time poking at this with the test scaffolding I showed at

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen
Hi, On 05/25/2017 11:12 AM, Sokolov Yura wrote: I agree that lonely semicolon looks bad. Applied your suggestion for empty loop body (/* skip */). Patch in first letter had while(true), but I removed it cause I think it is uglier: - `while(true)` was necessary for grouping read with `if`, -

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jeff Janes
On Tue, Sep 5, 2017 at 11:39 AM, Jesper Pedersen wrote: > On 09/05/2017 02:24 PM, Tom Lane wrote: > >> Jesper Pedersen writes: >> >>> I have tested this patch on a 2-socket machine, but don't see any >>> performance change in the various

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Tom Lane
Jesper Pedersen writes: > On 09/05/2017 02:24 PM, Tom Lane wrote: >> Hm, so if we can't demonstrate a performance win, it's hard to justify >> risking touching this code. What test case(s) did you use? > I ran pgbench (-M prepared) with synchronous_commit 'on' and

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen
On 09/05/2017 02:24 PM, Tom Lane wrote: Jesper Pedersen writes: I have tested this patch on a 2-socket machine, but don't see any performance change in the various runs. However, there is no regression either in all cases. Hm, so if we can't demonstrate a

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Tom Lane
Jesper Pedersen writes: > I have tested this patch on a 2-socket machine, but don't see any > performance change in the various runs. However, there is no regression > either in all cases. Hm, so if we can't demonstrate a performance win, it's hard to justify

Re: [HACKERS] Fix performance of generic atomics

2017-09-05 Thread Jesper Pedersen
Hi, On 05/25/2017 11:12 AM, Sokolov Yura wrote: I agree that lonely semicolon looks bad. Applied your suggestion for empty loop body (/* skip */). Patch in first letter had while(true), but I removed it cause I think it is uglier: - `while(true)` was necessary for grouping read with `if`, -

Re: [HACKERS] Fix performance of generic atomics

2017-06-04 Thread Sokolov Yura
Good day, every one. I'm just posting benchmark numbers for atomics patch. Hardware: 4 socket 72 core (144HT) x86_64 Centos 7.1 postgresql.conf tuning: shared_buffers = 32GB fsync = on synchronous_commit = on full_page_writes = off wal_buffers = 16MB wal_writer_flush_after = 16MB commit_delay =

Re: [HACKERS] Fix performance of generic atomics

2017-05-25 Thread Sokolov Yura
Hello, Tom. I agree that lonely semicolon looks bad. Applied your suggestion for empty loop body (/* skip */). Patch in first letter had while(true), but I removed it cause I think it is uglier: - `while(true)` was necessary for grouping read with `if`, - but now there is single statement in a

Re: [HACKERS] Fix performance of generic atomics

2017-05-25 Thread Aleksander Alekseev
Hi Yura, > Attached patch contains patch for all generic atomic > functions, and also __sync_fetch_and_(or|and) for gcc, cause > I believe GCC optimize code around intrinsic better than > around inline assembler. > (final performance is around 86000tps, but difference between > 83000tps and

Re: [HACKERS] Fix performance of generic atomics

2017-05-25 Thread Tom Lane
Sokolov Yura writes: @@ -382,12 +358,8 @@ static inline uint64 pg_atomic_fetch_and_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 and_) { uint64 old; - while (true) - { - old = pg_atomic_read_u64_impl(ptr); - if

[HACKERS] Fix performance of generic atomics

2017-05-25 Thread Sokolov Yura
Good day, everyone. I've been played with pgbench on huge machine. (72 cores, 56 for postgresql, enough memory to fit base both into shared_buffers and file cache) (pgbench scale 500, unlogged tables, fsync=off, synchronous commit=off, wal_writer_flush_after=0). With 200 clients performance is

Re: [HACKERS] Fix performance of generic atomics

2017-05-25 Thread Sokolov Yura
A bit cleaner version of a patch. Sokolov Yura писал 2017-05-25 15:22: Good day, everyone. I've been played with pgbench on huge machine. (72 cores, 56 for postgresql, enough memory to fit base both into shared_buffers and file cache) (pgbench scale 500, unlogged tables, fsync=off, synchronous