Good morning!
FYI: I know many people are/were tracking this email thread rather
than the newer and more recent one "scalability bottlenecks with
(many) partitions (and more)", but please see [1] [2] , where Tomas
committed enhanced fast-path locking to the master(18).
Thanks Tomas for persisten
On 8/8/23 3:04 PM, Andres Freund wrote:
> On 2023-08-08 16:44:37 -0400, Robert Haas wrote:
>> On Mon, Aug 7, 2023 at 6:05 PM Andres Freund wrote:
>>> I think the biggest flaw of the locking scheme is that the LockHash locks
>>> protect two, somewhat independent, things:
>>> 1) the set of currently
Hi,
On 2023-08-08 16:44:37 -0400, Robert Haas wrote:
> On Mon, Aug 7, 2023 at 6:05 PM Andres Freund wrote:
> > I think the biggest flaw of the locking scheme is that the LockHash locks
> > protect two, somewhat independent, things:
> > 1) the set of currently lockable objects, i.e. the entries in
On Mon, Aug 7, 2023 at 6:05 PM Andres Freund wrote:
> I think the biggest flaw of the locking scheme is that the LockHash locks
> protect two, somewhat independent, things:
> 1) the set of currently lockable objects, i.e. the entries in the hash table
> [partition]
> 2) the state of all the locks
Hi,
On 2023-08-07 14:36:48 -0700, Andres Freund wrote:
> What if fast path locks entered PROCLOCK into the shared hashtable, just like
> with normal locks, the first time a lock is acquired by a backend. Except that
> we'd set a flag indicating the lock is a fastpath lock. When the lock is
> relea
Hi,
On 2023-08-07 13:05:32 -0400, Robert Haas wrote:
> I would also argue that the results are actually not that great,
> because once you get past 64 partitions you're right back where you
> started, or maybe worse off. To me, there's nothing magical about
> cases between 16 and 64 relations that
>
> Why would the access frequency be uniform? In particular, there's a huge
> variability in how long the locks need to exist
>
As a supporting data point, our example production workload shows a 3x
difference between the most versus least frequently contended lock_manager
lock:
https://gitlab.co
Hi,
On 2023-08-07 13:59:26 -0700, Matt Smiley wrote:
> I have not yet written a reproducer since we see this daily in production.
> I have a sketch of a few ways that I think will reproduce the behavior
> we're observing, but haven't had time to implement it.
>
> I'm not sure if we're seeing this
Hi Andres, thanks for helping! Great questions, replies are inline below.
On Sun, Aug 6, 2023 at 1:00 PM Andres Freund wrote:
> Hm, I'm curious whether you have a way to trigger the issue outside of your
> prod environment. Mainly because I'm wondering if you're potentially
> hitting
> the issu
On Mon, Aug 7, 2023 at 3:48 PM Tomas Vondra
wrote:
> Why would the access frequency be uniform? In particular, there's a huge
> variability in how long the locks need to exist - IIRC we may be keeping
> locks for tables for a long time, but not for indexes. From this POV it
> might be better to do
On 8/7/23 21:21, Robert Haas wrote:
> On Mon, Aug 7, 2023 at 3:02 PM Tomas Vondra
> wrote:
>>> I would also argue that the results are actually not that great,
>>> because once you get past 64 partitions you're right back where you
>>> started, or maybe worse off. To me, there's nothing magical
Thank you Tomas! I really appreciate your willingness to dig in here and
help us out! The rest of my replies are inline below.
On Thu, Aug 3, 2023 at 1:39 PM Tomas Vondra
wrote:
> The analysis in the linked gitlab issue is pretty amazing. I wasn't
> planning to argue against the findings anywa
On Mon, Aug 7, 2023 at 3:02 PM Tomas Vondra
wrote:
> > I would also argue that the results are actually not that great,
> > because once you get past 64 partitions you're right back where you
> > started, or maybe worse off. To me, there's nothing magical about
> > cases between 16 and 64 relation
On 8/7/23 18:56, Nathan Bossart wrote:
> On Mon, Aug 07, 2023 at 12:51:24PM +0200, Tomas Vondra wrote:
>> The bad news is this seems to have negative impact on cases with few
>> partitions, that'd fit into 16 slots. Which is not surprising, as the
>> code has to walk longer arrays, it probably affe
On 8/7/23 19:05, Robert Haas wrote:
> On Mon, Aug 7, 2023 at 6:51 AM Tomas Vondra
> wrote:
>> The regression appears to be consistently ~3%, and v2 aimed to improve
>> that - at least for the case with just 100 rows. It even gains ~5% in a
>> couple cases. It's however a bit strange v2 doesn't rea
On Mon, Aug 7, 2023 at 6:51 AM Tomas Vondra
wrote:
> The regression appears to be consistently ~3%, and v2 aimed to improve
> that - at least for the case with just 100 rows. It even gains ~5% in a
> couple cases. It's however a bit strange v2 doesn't really help the two
> larger cases.
To me, th
On Mon, Aug 07, 2023 at 12:51:24PM +0200, Tomas Vondra wrote:
> The bad news is this seems to have negative impact on cases with few
> partitions, that'd fit into 16 slots. Which is not surprising, as the
> code has to walk longer arrays, it probably affects caching etc. So this
> would hurt the sy
Hi,
On 2023-08-02 16:51:29 -0700, Matt Smiley wrote:
> I thought it might be helpful to share some more details from one of the
> case studies behind Nik's suggestion.
>
> Bursty contention on lock_manager lwlocks recently became a recurring cause
> of query throughput drops for GitLab.com, and w
On 8/3/23 22:39, Tomas Vondra wrote:
> On 8/3/23 01:51, Matt Smiley wrote:
>> I thought it might be helpful to share some more details from one of the
>> case studies behind Nik's suggestion.
>>
>> Bursty contention on lock_manager lwlocks recently became a recurring
>> cause of query throughput
On 8/3/23 01:51, Matt Smiley wrote:
> I thought it might be helpful to share some more details from one of the
> case studies behind Nik's suggestion.
>
> Bursty contention on lock_manager lwlocks recently became a recurring
> cause of query throughput drops for GitLab.com, and we got to study the
I thought it might be helpful to share some more details from one of the
case studies behind Nik's suggestion.
Bursty contention on lock_manager lwlocks recently became a recurring cause
of query throughput drops for GitLab.com, and we got to study the behavior
via USDT and uprobe instrumentation
On 7/13/23 07:02, Nikolay Samokhvalov wrote:
> We're observing a few cases with lockmanager spikes in a few quite
> loaded systems.
>
> These cases are different; queries are different, Postgres versions are
> 12, 13, and 14.
>
> But in all cases, servers are quite beefy (96-128 vCPUs, ~600-80
We're observing a few cases with lockmanager spikes in a few quite loaded
systems.
These cases are different; queries are different, Postgres versions are 12,
13, and 14.
But in all cases, servers are quite beefy (96-128 vCPUs, ~600-800 GiB)
receiving a lot of TPS (a few dozens of thousands). Mos
23 matches
Mail list logo