I wrote:
> Ah. I was not thinking of touching pg_atomic_read_u32/u64_impl,
> although now that you mention it, it's not clear to me why we
> couldn't simplify
> - return *(>value);
> + return ptr->value;
Just to check, I applied that change to pg_atomic_read_u32_impl and
Andres Freund writes:
> On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
>> I think we can just use "old = ptr->value" to set up for the cmpxchg
>> loop in every generic.h function that uses such a loop.
> I think we might have been talking past each other - I thought you were
>
On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> >> It looks to me like two of the three implementations promise no such
> >> thing.
>
> > They're volatile vars, so why not?
>
> Yeah, but so are the
Andres Freund writes:
> On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
>> It looks to me like two of the three implementations promise no such
>> thing.
> They're volatile vars, so why not?
Yeah, but so are the caller's variables. That is, in
On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> >> However, if that's the reasoning, why don't we make all of these
> >> use simple reads? It seems unlikely that a locked read is free.
>
> > We don't
Andres Freund writes:
> On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
>> However, if that's the reasoning, why don't we make all of these
>> use simple reads? It seems unlikely that a locked read is free.
> We don't really use locked reads? All the _atomic_ wrapper forces is
Hi,
On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> I wrote:
> > Anyway, I don't have a big objection to applying this. My concern
> > is more that we need to be taking a harder look at other parts of
> > the atomics infrastructure, because tweaks there are likely to buy
> > much more.
>
> I
I wrote:
> Anyway, I don't have a big objection to applying this. My concern
> is more that we need to be taking a harder look at other parts of
> the atomics infrastructure, because tweaks there are likely to buy
> much more.
I went ahead and pushed these patches, adding __sync_fetch_and_sub
Hi Jeff,
On 09/05/2017 03:47 PM, Jeff Janes wrote:
I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using
both logged and unlogged tables. Also ran an internal benchmark which
didn't show anything either.
What scale factor and client count? How many cores per socket? It
Robert Haas writes:
> It's not a question of whether the return value is used, but of
> whether the updated value of *old is used.
Right, but if we re-read "old" in the loop, and if the primitive
doesn't return "old" (or does, but call site ignores it) then in
principle
On Wed, Sep 6, 2017 at 12:01 PM, Tom Lane wrote:
>> This seems like a pretty sound argument to me. I think Tom's probably
>> right that the changes in generic-gcc.h are the important ones, but
>> I'm not sure that's an argument against patching generics.h. Given
>> that
Robert Haas writes:
> On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura
> wrote:
>> But I think, generic version still should be "fixed".
>> If generic version is not reached on any platform, then why it is kept?
>> If it is reached somewhere, then
On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura
wrote:
> Yes, you're right.
>
> But I think, generic version still should be "fixed".
> If generic version is not reached on any platform, then why it is kept?
> If it is reached somewhere, then it should be improved.
This
On 2017-09-06 16:36, Tom Lane wrote:
Sokolov Yura writes:
On 2017-09-06 15:56, Tom Lane wrote:
The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the
less-generic
atomics code, and the latter
Sokolov Yura writes:
> On 2017-09-06 15:56, Tom Lane wrote:
>> The point I'm trying to make is that if tweaking generic.h improves
>> performance then it's an indicator of missed cases in the less-generic
>> atomics code, and the latter is where our attention should
On 2017-09-06 15:56, Tom Lane wrote:
Simon Riggs writes:
On 5 September 2017 at 21:23, Tom Lane wrote:
Moreover, it matters which primitive you're testing, on which
platform,
with which compiler, because we have a couple of layers of atomic ops
Simon Riggs writes:
> On 5 September 2017 at 21:23, Tom Lane wrote:
>> Moreover, it matters which primitive you're testing, on which platform,
>> with which compiler, because we have a couple of layers of atomic ops
>> implementations.
> If there is no
On 5 September 2017 at 21:23, Tom Lane wrote:
> Jeff Janes writes:
>> What scale factor and client count? How many cores per socket? It looks
>> like Sokolov was just starting to see gains at 200 clients on 72 cores,
>> using -N transaction.
...
>
On 2017-09-06 14:54, Sokolov Yura wrote:
On 2017-09-06 07:23, Tom Lane wrote:
Jeff Janes writes:
What scale factor and client count? How many cores per socket? It
looks
like Sokolov was just starting to see gains at 200 clients on 72
cores,
using -N transaction.
This
On 2017-09-06 07:23, Tom Lane wrote:
Jeff Janes writes:
What scale factor and client count? How many cores per socket? It
looks
like Sokolov was just starting to see gains at 200 clients on 72
cores,
using -N transaction.
This means that Sokolov's proposed changes in
Jeff Janes writes:
> What scale factor and client count? How many cores per socket? It looks
> like Sokolov was just starting to see gains at 200 clients on 72 cores,
> using -N transaction.
I spent some time poking at this with the test scaffolding I showed at
Hi,
On 05/25/2017 11:12 AM, Sokolov Yura wrote:
I agree that lonely semicolon looks bad.
Applied your suggestion for empty loop body (/* skip */).
Patch in first letter had while(true), but I removed it cause
I think it is uglier:
- `while(true)` was necessary for grouping read with `if`,
-
On Tue, Sep 5, 2017 at 11:39 AM, Jesper Pedersen wrote:
> On 09/05/2017 02:24 PM, Tom Lane wrote:
>
>> Jesper Pedersen writes:
>>
>>> I have tested this patch on a 2-socket machine, but don't see any
>>> performance change in the various
Jesper Pedersen writes:
> On 09/05/2017 02:24 PM, Tom Lane wrote:
>> Hm, so if we can't demonstrate a performance win, it's hard to justify
>> risking touching this code. What test case(s) did you use?
> I ran pgbench (-M prepared) with synchronous_commit 'on' and
On 09/05/2017 02:24 PM, Tom Lane wrote:
Jesper Pedersen writes:
I have tested this patch on a 2-socket machine, but don't see any
performance change in the various runs. However, there is no regression
either in all cases.
Hm, so if we can't demonstrate a
Jesper Pedersen writes:
> I have tested this patch on a 2-socket machine, but don't see any
> performance change in the various runs. However, there is no regression
> either in all cases.
Hm, so if we can't demonstrate a performance win, it's hard to justify
Hi,
On 05/25/2017 11:12 AM, Sokolov Yura wrote:
I agree that lonely semicolon looks bad.
Applied your suggestion for empty loop body (/* skip */).
Patch in first letter had while(true), but I removed it cause
I think it is uglier:
- `while(true)` was necessary for grouping read with `if`,
-
Good day, every one.
I'm just posting benchmark numbers for atomics patch.
Hardware: 4 socket 72 core (144HT) x86_64 Centos 7.1
postgresql.conf tuning:
shared_buffers = 32GB
fsync = on
synchronous_commit = on
full_page_writes = off
wal_buffers = 16MB
wal_writer_flush_after = 16MB
commit_delay =
Hello, Tom.
I agree that lonely semicolon looks bad.
Applied your suggestion for empty loop body (/* skip */).
Patch in first letter had while(true), but I removed it cause
I think it is uglier:
- `while(true)` was necessary for grouping read with `if`,
- but now there is single statement in a
Hi Yura,
> Attached patch contains patch for all generic atomic
> functions, and also __sync_fetch_and_(or|and) for gcc, cause
> I believe GCC optimize code around intrinsic better than
> around inline assembler.
> (final performance is around 86000tps, but difference between
> 83000tps and
Sokolov Yura writes:
@@ -382,12 +358,8 @@ static inline uint64
pg_atomic_fetch_and_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 and_)
{
uint64 old;
- while (true)
- {
- old = pg_atomic_read_u64_impl(ptr);
- if
Good day, everyone.
I've been played with pgbench on huge machine.
(72 cores, 56 for postgresql, enough memory to fit base
both into shared_buffers and file cache)
(pgbench scale 500, unlogged tables, fsync=off,
synchronous commit=off, wal_writer_flush_after=0).
With 200 clients performance is
A bit cleaner version of a patch.
Sokolov Yura писал 2017-05-25 15:22:
Good day, everyone.
I've been played with pgbench on huge machine.
(72 cores, 56 for postgresql, enough memory to fit base
both into shared_buffers and file cache)
(pgbench scale 500, unlogged tables, fsync=off,
synchronous
33 matches
Mail list logo