Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Jacky Leng
Heikki Linnakangas [EMAIL PROTECTED] writes: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. How about just allowing to use wal even WAL archiving is disabled? It

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Heikki Linnakangas
Jacky Leng wrote: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. How about just allowing to use wal even WAL archiving is disabled? It seems that recovery of

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Tom Lane wrote: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. Ok, I'll write a patch to do that. There's a small problem with that: DROP

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: The best I can think of is to rename the obsolete file to relfilenode.stale, when it's scheduled for deletion at next checkpoint, and check for .stale-suffixed files in GetNewRelFileNode, and delete them immediately in DropTableSpace. This is

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: The best I can think of is to rename the obsolete file to relfilenode.stale, when it's scheduled for deletion at next checkpoint, and check for .stale-suffixed files in GetNewRelFileNode, and delete them immediately in

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Florian G. Pflug
Heikki Linnakangas wrote: Tom Lane wrote: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. Ok, I'll write a patch to do that. What is the argument against making

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Florian G. Pflug
Heikki Linnakangas wrote: Tom Lane wrote: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. Ok, I'll write a patch to do that. What is the argument against making

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Heikki Linnakangas
Florian G. Pflug wrote: Heikki Linnakangas wrote: Tom Lane wrote: I tend to agree that truncating the file, and extending the fsync request mechanism to actually delete it after the next checkpoint, is the most reasonable route to a fix. Ok, I'll write a patch to do that. What is the

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes: What is the argument against making relfilenodes globally unique by adding the xid and epoch of the creating transaction to the filename? 1. Zero chance of ever backpatching. (I know I said I wasn't excited about that, but it's still a strike

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-18 Thread Florian G. Pflug
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: What is the argument against making relfilenodes globally unique by adding the xid and epoch of the creating transaction to the filename? 1. Zero chance of ever backpatching. (I know I said I wasn't excited about that, but it's still

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-17 Thread Heikki Linnakangas
Simon Riggs wrote: On Wed, 2007-10-17 at 15:02 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: If you've got a better problem statement it would be good to get that right first before we discuss solutions. Reusing a relfilenode of a deleted relation, before next checkpoint following the

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-17 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: I don't think you still quite understand what's happening. GetNewOid() is not interesting here, look at GetNewRelFileNode() instead. And neither are snapshots or MVCC visibility rules. Simon has a legitimate objection; not that there's no bug, but

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-17 Thread Heikki Linnakangas
Tom Lane wrote: Simon has a legitimate objection; not that there's no bug, but that the probability of getting bitten is exceedingly small. Oh, if that's what he meant, he's right. The test script you showed cheats six-ways-from-Sunday to cause an OID collision that would never happen in

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-17 Thread Simon Riggs
On Wed, 2007-10-17 at 17:36 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2007-10-17 at 15:02 +0100, Heikki Linnakangas wrote: Simon Riggs wrote: If you've got a better problem statement it would be good to get that right first before we discuss solutions. Reusing a

Re: [HACKERS] Why copy_relation_data only use wal whenWALarchivingis enabled

2007-10-17 Thread Simon Riggs
On Wed, 2007-10-17 at 18:13 +0100, Heikki Linnakangas wrote: The test script you showed cheats six-ways-from-Sunday to cause an OID collision that would never happen in practice. The only case where it would really happen is if a table that has existed for a long time (~ 2^32 OID