[GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Mathieu Fenniak
Heyo, I'm attempting to use logical decoding with the streaming replication protocol to perform change-data-capture on PostgreSQL 9.5.4. I'm seeing the replication stream "stall" for long periods of time where the walsender process will be pinned at 100% CPU utilization, but no data is being sent

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Tom Lane
Mathieu Fenniak writes: > I'm attempting to use logical decoding with the streaming replication > protocol to perform change-data-capture on PostgreSQL 9.5.4. I'm seeing > the replication stream "stall" for long periods of time where the walsender > process will be pinned at 100% CPU utilization,

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
Hi, On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote: > The stalls occur unpredictably on my production system, but generally seem > to be correlated with schema operations. My source database has about > 100,000 tables; it's a one-schema-per-tenant multi-tenant SaaS system. I'm unfortunately

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
Hi, On 2017-05-05 20:59:09 -0400, Tom Lane wrote: > Hmm ... as for RelfilenodeMapInvalidateCallback, the lack of calls to > hash_search() from it in your trace says that it usually isn't doing > anything useful. All the time is being spent in hash_seq_search, > uselessly iterating over the hashta

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Tom Lane
Andres Freund writes: > On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote: >> It appears that most of the time is spent in the >> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache >> invalidation callbacks, both of which appear to be invalidating caches >> based upon the cache

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-05 Thread Andres Freund
On 2017-05-05 21:32:27 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote: > >> It appears that most of the time is spent in the > >> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache > >> invalidation callbacks, both of which a

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-10 Thread Mathieu Fenniak
Hi Andres, Tom, Andres, it seems like the problem is independent of having large data manipulations mixed with schema changes. The below test case demonstrates it with just schema changes. Tom, I've tested your patch, and it seems to have a positive impact for sure. I've documented a test case

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-10 Thread Tom Lane
Mathieu Fenniak writes: > Andres, it seems like the problem is independent of having large data > manipulations mixed with schema changes. The below test case demonstrates > it with just schema changes. > Tom, I've tested your patch, and it seems to have a positive impact for > sure. I've docum

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-10 Thread Andres Freund
Hi Tom, Mathieu, On 2017-05-10 17:02:11 -0400, Tom Lane wrote: > Mathieu Fenniak writes: > > Andres, it seems like the problem is independent of having large data > > manipulations mixed with schema changes. The below test case demonstrates > > it with just schema changes. > > > Tom, I've teste

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-12 Thread Mathieu Fenniak
Hi Tom, Andres, I've taken your patches, Tom, and applied them to a test deployment of my actual application. The most accessible way I have to reproduce this issue is to run a maintenance task that we typically run during a software deployment, which will remove some tenant schemas from our data

Re: [GENERAL] Logical decoding CPU-bound w/ large number of tables

2017-05-12 Thread Tom Lane
Mathieu Fenniak writes: > I've taken your patches, Tom, and applied them to a test deployment of my > actual application. > The most accessible way I have to reproduce this issue is to run a > maintenance task that we typically run during a software deployment, which > will remove some tenant sch