pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Over at [1] we have a complaint that dump-and-restore fails for hash-partitioned tables if a partitioning column is an enum, because the enum values are unlikely to receive the same OIDs in the destination database as they had in the source, and the hash codes are dependent on those OIDs. So resto

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 11:18 AM Tom Lane wrote: > Over at [1] we have a complaint that dump-and-restore fails for > hash-partitioned tables if a partitioning column is an enum, > because the enum values are unlikely to receive the same OIDs > in the destination database as they had in the source,

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 11:18 AM Tom Lane wrote: >> Over at [1] we have a complaint that dump-and-restore fails for >> hash-partitioned tables if a partitioning column is an enum, >> because the enum values are unlikely to receive the same OIDs >> in the destination database

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > ... I like the > fact that we have --load-via-partition-root, but it is a bit of a > hack. You don't get a single copy into the partition root, you get one > per child table -- and those COPY statements are listed as data for > the partitions where the data lives now, not for

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 1:23 PM Tom Lane wrote: > Well, that was what I thought too to start with, but I now think that > it is far too narrow-minded a view of the problem. The real issue > is something I said that you trimmed: > > >> In general, we've never thought that hash values are > >> requi

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 1:23 PM Tom Lane wrote: >> In the meantime, I think we need to recognize that hash values are >> not very portable. I do not think we do our users a service by >> letting them discover the corner cases the hard way. > I think you're not really engagi

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 12:34 PM Tom Lane wrote: > > Also, and I think pretty > > significantly, using --load-via-partition-root forces you to pay the > > overhead of rerouting every tuple to the target partition whether you > > need it or not, which is potentially a large unnecessary expense. > >

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 3:34 PM Tom Lane wrote: > I spent a bit more time thinking about that, and while I agree that > it's an oddity, I don't see that it matters in the case of hash > partitioning. You would notice an issue if you tried to do a selective > restore of just one partition --- but u

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 1:14 PM Robert Haas wrote: > It seems to me that the job of pg_dump is to produce a dump that, when > reloaded on another system, recreates the same database state. That > means that we end up with all of the same objects, each defined in the > same way, and that all of the

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 12:39 PM Robert Haas wrote: > I don't think the fact that our *traditional* standard for how stable > a hash function needs to be has been XYZ carries any water. Needs > change over time, and we adapt the code to meet the new needs. Since > we have no system for type propert

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 4:12 PM Tom Lane wrote: > > I don't think the fact that our *traditional* standard for how stable > > a hash function needs to be has been XYZ carries any water. > > Well, it wouldn't need to if we had a practical way of changing the > behavior of an existing hash function,

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > It seems to me that the job of pg_dump is to produce a dump that, when > reloaded on another system, recreates the same database state. That > means that we end up with all of the same objects, each defined in the > same way, and that all of the tables end up with all the sam

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 4:44 PM Peter Geoghegan wrote: > This is a misrepresentation of Tom's words. It isn't actually > self-evident what "we end up with all of the same objects, each > defined in the same way, and that all of the tables end up with all > the same contents that they had before" ac

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > Tom, as I understand it, is arguing that the > --load-via-partition-root behavior has negligible downsides and is > almost categorically better than the current default behavior, and > thus making that the new default in some or all situations in a minor > release is totally

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 2:12 PM Robert Haas wrote: > On Wed, Feb 1, 2023 at 4:44 PM Peter Geoghegan wrote: > > This is a misrepresentation of Tom's words. It isn't actually > > self-evident what "we end up with all of the same objects, each > > defined in the same way, and that all of the tables e

Re: pg_dump versus hash partitioning

2023-02-01 Thread Robert Haas
On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: > > Here, you'd like to argue that it's perfectly > > fine if we instead insert some of the rows into different tables than > > where they were on the original system. > > I can agree with that argument for range or list partitioning, where > the part

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Peter Geoghegan writes: > You mentioned "minor releases" here. Who said anything about that? I did: I'd like to back-patch the fix if possible. I think changing the default --load-via-partition-root choice could be back-patchable. If Robert is resistant to that but would accept it in master, I'

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: >> I can agree with that argument for range or list partitioning, where >> the partitions have some semantic meaning to the user. I don't buy it >> for hash partitioning. It was an implementation artifact to begin >> with that

Re: pg_dump versus hash partitioning

2023-02-01 Thread David G. Johnston
On Wed, Feb 1, 2023 at 3:38 PM Tom Lane wrote: > Peter Geoghegan writes: > > You mentioned "minor releases" here. Who said anything about that? > > I did: I'd like to back-patch the fix if possible. I think changing > the default --load-via-partition-root choice could be back-patchable. > > If

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
Robert Haas writes: > On Wed, Feb 1, 2023 at 4:12 PM Tom Lane wrote: >> That being the case, I don't think moving the goalposts for hash >> function stability is going to lead to a workable solution. > I don't see that there is any easy, clean way to solve this in > released branches. The idea t

Re: pg_dump versus hash partitioning

2023-02-01 Thread Peter Geoghegan
On Wed, Feb 1, 2023 at 2:49 PM Tom Lane wrote: > It's precisely because you want to analyze it in the same terms > as range/list partitioning that we have these issues. Or we could > have built it on some other infrastructure than hash index opclasses > ... but we didn't do that, and now we have

Re: pg_dump versus hash partitioning

2023-02-01 Thread David Rowley
On Thu, 2 Feb 2023 at 11:38, Tom Lane wrote: > > Peter Geoghegan writes: > > You mentioned "minor releases" here. Who said anything about that? > > I did: I'd like to back-patch the fix if possible. I think changing > the default --load-via-partition-root choice could be back-patchable. > > If R

Re: pg_dump versus hash partitioning

2023-02-01 Thread Tom Lane
David Rowley writes: > Digging into the history a bit, I found [2] and particularly [3] that > seem to indicate this option was thought about due to concerns about > hash functions not returning consistent results on different > architectures. I suspect it might have been defaulted to load into th

Re: pg_dump versus hash partitioning

2023-02-02 Thread Laurenz Albe
On Wed, 2023-02-01 at 17:49 -0500, Tom Lane wrote: > Robert Haas writes: > > On Wed, Feb 1, 2023 at 5:08 PM Tom Lane wrote: > > > I can agree with that argument for range or list partitioning, where > > > the partitions have some semantic meaning to the user.  I don't buy it > > > for hash partit

Re: pg_dump versus hash partitioning

2023-02-02 Thread Alvaro Herrera
On 2023-Feb-01, Robert Haas wrote: > I think you can construct plausible cases where it's not just > academic. For instance, suppose I intend to use some kind of logical > replication system, not necessarily the one built into PostgreSQL, to > replicate data between two systems. Before engaging th

Re: pg_dump versus hash partitioning

2023-02-02 Thread Andrew Dunstan
On 2023-02-01 We 20:03, Tom Lane wrote: > > Anyway, after re-reading the old thread I wonder if my first instinct > (force --load-via-partition-root for enum hash cases only) was the > best compromise after all. I'm not sure how painful it is to get > pg_dump to detect such cases, but it's proba

Re: pg_dump versus hash partitioning

2023-02-02 Thread Robert Haas
On Wed, Feb 1, 2023 at 6:14 PM Tom Lane wrote: > You waved your arms about inventing some new hashing infrastructure, > but it was phrased in such a way that it wasn't clear to me if that > was actually a serious proposal or not. But if it is: how will you > get around the fact that any change to

Re: pg_dump versus hash partitioning

2023-02-02 Thread Tom Lane
Andrew Dunstan writes: > On 2023-02-01 We 20:03, Tom Lane wrote: >> Anyway, after re-reading the old thread I wonder if my first instinct >> (force --load-via-partition-root for enum hash cases only) was the >> best compromise after all. I'm not sure how painful it is to get >> pg_dump to detect

Re: pg_dump versus hash partitioning

2023-02-02 Thread Tom Lane
Alvaro Herrera writes: > ... so for --load-via-partition-root=auto (or > whatever), we need to ensure that we detect hash partitioning all the > way down from the topmost to the leaves. Yeah, that had already occurred to me, which is why I was not feeling confident about it being an easy hack in

Re: pg_dump versus hash partitioning

2023-02-14 Thread Tom Lane
Here's a set of draft patches around this issue. 0001 does what I last suggested, ie force load-via-partition-root for leaf tables underneath a partitioned table with a partitioned-by-hash enum column. It wasn't quite as messy as I first feared, although we do need a new query (and pg_dump now kn

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Tue, Feb 14, 2023 at 2:21 PM Tom Lane wrote: > This made me wonder if this could be a usable solution at all, but > after thinking for awhile, I don't see how the claim about foreign key > constraints is anything but FUD. pg_dump/pg_restore have sufficient > dependency logic to prevent that fr

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > On Tue, Feb 14, 2023 at 2:21 PM Tom Lane wrote: >> This made me wonder if this could be a usable solution at all, but >> after thinking for awhile, I don't see how the claim about foreign key >> constraints is anything but FUD. pg_dump/pg_restore have sufficient >> dependen

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: > Well, that's a user error not pg_dump's fault. Particularly so for hash > partitioning, where there is no defensible reason to make the partitions > semantically different. I am still of the opinion that you're going down a dangerous path of red

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: >> Well, that's a user error not pg_dump's fault. Particularly so for hash >> partitioning, where there is no defensible reason to make the partitions >> semantically different. > I am still of the opinion that you're going

Re: pg_dump versus hash partitioning

2023-02-27 Thread Robert Haas
On Mon, Feb 27, 2023 at 12:50 PM Tom Lane wrote: > Robert Haas writes: > > On Mon, Feb 27, 2023 at 11:20 AM Tom Lane wrote: > >> Well, that's a user error not pg_dump's fault. Particularly so for hash > >> partitioning, where there is no defensible reason to make the partitions > >> semanticall

Re: pg_dump versus hash partitioning

2023-02-27 Thread Tom Lane
Robert Haas writes: > Sure, but I was responding to your assertion that there's no case in > which --load-via-partition-root could cause a restore failure. I'm not > sure that's accurate. Perhaps it's not, but it's certainly far less likely to cause a restore failure than the behavior I want to r

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Tue, Feb 14, 2023 at 02:21:33PM -0500, Tom Lane wrote: > Here's a set of draft patches around this issue. > > 0001 does what I last suggested, ie force load-via-partition-root for > leaf tables underneath a partitioned table with a partitioned-by-hash > enum column. It wasn't quite as messy as

Re: pg_dump versus hash partitioning

2023-03-10 Thread Tom Lane
Julien Rouhaud writes: > Working on some side project that can cause dump of hash partitions to be > routed to a different partition, I realized that --load-via-partition-root can > indeed cause deadlock in such case without FK dependency or anything else. > The problem is that each worker will p

Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Fri, Mar 10, 2023 at 10:10:14PM -0500, Tom Lane wrote: > Julien Rouhaud writes: > > Working on some side project that can cause dump of hash partitions to be > > routed to a different partition, I realized that --load-via-partition-root > > can > > indeed cause deadlock in such case without FK

Re: pg_dump versus hash partitioning

2023-03-12 Thread Tom Lane
Julien Rouhaud writes: > The BEGIN + TRUNCATE is only there to avoid generating WAL records just in > case > the wal_level is minimal. I don't remember if that optimization still exists, > but if yes we could avoid doing that if the server's wal_level is replica or > higher? That's not perfect

Re: pg_dump versus hash partitioning

2023-03-12 Thread Justin Pryzby
On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > What I propose we do about that is further tweak things so that > load-via-partition-root forces dumping via COPY. AFAIK the only > compelling use-case for dump-as-INSERTs is in transferring data > to a non-Postgres database, which is a c

Re: pg_dump versus hash partitioning

2023-03-12 Thread Tom Lane
Justin Pryzby writes: > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: >> What I propose we do about that is further tweak things so that >> load-via-partition-root forces dumping via COPY. AFAIK the only >> compelling use-case for dump-as-INSERTs is in transferring data >> to a non-Po

Re: pg_dump versus hash partitioning

2023-03-13 Thread Julien Rouhaud
On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > The BEGIN + TRUNCATE is only there to avoid generating WAL records just in > > case > > the wal_level is minimal. I don't remember if that optimization still > > exists, > > but if yes we could avoid doing th

Re: pg_dump versus hash partitioning

2023-03-13 Thread Tom Lane
Julien Rouhaud writes: > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: >> The trick is to detect in pg_restore whether pg_dump chose to do >> load-via-partition-root. > Given that this approach wouldn't help with existing dump files (at least if > using COPY, in any case the one using

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud
On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Sun, Mar 12, 2023 at 03:46:52PM -0400, Tom Lane wrote: > >> The trick is to detect in pg_restore whether pg_dump chose to do > >> load-via-partition-root. > > > Given that this approach wouldn't help with exi

Re: pg_dump versus hash partitioning

2023-03-16 Thread Tom Lane
Julien Rouhaud writes: > On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: >> Yeah, we need to do both. Attached find an updated patch series: > I didn't find a CF entry, is it intended? Yeah, it's there: https://commitfest.postgresql.org/42/4226/ > I'm not sure if you intend to keep

Re: pg_dump versus hash partitioning

2023-03-16 Thread Julien Rouhaud
On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Mon, Mar 13, 2023 at 07:39:12PM -0400, Tom Lane wrote: > >> Yeah, we need to do both. Attached find an updated patch series: > > > I didn't find a CF entry, is it intended? > > Yeah, it's there: > > https://

Re: pg_dump versus hash partitioning

2023-03-17 Thread Tom Lane
Julien Rouhaud writes: > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: >> I think the odds of that yielding a usable dump are nil, so I don't >> see why we should bother. > No objection from me. OK, pushed with the discussed changes. regards, tom lane

Re: pg_dump versus hash partitioning

2023-03-17 Thread Julien Rouhaud
On Fri, Mar 17, 2023 at 01:44:12PM -0400, Tom Lane wrote: > Julien Rouhaud writes: > > On Thu, Mar 16, 2023 at 08:43:56AM -0400, Tom Lane wrote: > >> I think the odds of that yielding a usable dump are nil, so I don't > >> see why we should bother. > > > No objection from me. > > OK, pushed with t