Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 7:04 PM, Robert Haas wrote: > On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus wrote: >> Robert, >> >>> Uh, no it doesn't.  It only requires you to be more aggressive about >>> vacuuming the transactions that are in the aborted-XIDs array.  It >>> doesn't affect transaction wrap

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 6:42 PM, Tom Lane wrote: > Robert Haas writes: >> >> 4. There would presumably be some finite limit on the size of the >> shared memory structure for aborted transactions.  I don't think >> there'd be any reason to make it particularly small, but if you sat >> there and ab

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Gurjeet Singh
On Wed, Nov 10, 2010 at 1:15 AM, Tom Lane wrote: > Once you know that there is, or isn't, > a filesystem-level error involved, what are you going to do next? > You're going to go try to debug the component you know is at fault, > that's what. And that problem is still AI-complete. > > If we know

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus wrote: > Robert, > >> Uh, no it doesn't.  It only requires you to be more aggressive about >> vacuuming the transactions that are in the aborted-XIDs array.  It >> doesn't affect transaction wraparound vacuuming at all, either >> positively or negatively

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Tom Lane
Robert Haas writes: > > 4. There would presumably be some finite limit on the size of the > shared memory structure for aborted transactions. I don't think > there'd be any reason to make it particularly small, but if you sat > there and aborted transactions at top speed you might eventually run

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Tom Lane
Josh Berkus writes: >> Though incidentally all of the other items you mentioned are generic >> problems caused by with MVCC, not hint bits. > Yes, but the hint bits prevent us from implementing workarounds. If we got rid of hint bits, we'd need workarounds for the ensuing massive performance los

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Josh Berkus
Robert, > Uh, no it doesn't. It only requires you to be more aggressive about > vacuuming the transactions that are in the aborted-XIDs array. It > doesn't affect transaction wraparound vacuuming at all, either > positively or negatively. You still have to freeze xmins before they > flip from b

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 3:05 PM, Greg Stark wrote: > On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus wrote: >> Well, most of the other MVCC-in-table DBMSes simply don't deal with >> large, on-disk databases.  In fact, I can't think of one which does, >> currently; while MVCC has been popular for the N

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 5:15 PM, Kevin Grittner wrote: > Josh Berkus wrote: > >> 6. This would require us to be more aggressive about VACUUMing >> old-cold relations/page, e.g. VACUUM FREEZE.  This it would make >> one of our worst issues for data warehousing even worse. > > I continue to feel tha

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 5:03 PM, Josh Berkus wrote: > On 11/9/10 1:50 PM, Robert Haas wrote: >> 5. It would be pretty much impossible to run with autovacuum turned >> off, and in fact you would likely need to make it a good deal more >> aggressive in the specific case of aborted transactions, to mi

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Kevin Grittner
Josh Berkus wrote: > 6. This would require us to be more aggressive about VACUUMing > old-cold relations/page, e.g. VACUUM FREEZE. This it would make > one of our worst issues for data warehousing even worse. I continue to feel that it is insane that when a table is populated within the same

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Josh Berkus
On 11/9/10 1:50 PM, Robert Haas wrote: > 5. It would be pretty much impossible to run with autovacuum turned > off, and in fact you would likely need to make it a good deal more > aggressive in the specific case of aborted transactions, to mitigate > problems #1, #3, and #4. 6. This would require

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 2:05 PM, Robert Haas wrote: > On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote: >> On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote: >>> So, for getting checksums, we have to offer up a few things: >>> 1) zero-copy writes, we need to buffer the write to get a consisten

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Josh Berkus
> Though incidentally all of the other items you mentioned are generic > problems caused by with MVCC, not hint bits. Yes, but the hint bits prevent us from implementing workarounds. -- -- Josh Berkus PostgreSQL Experts Inc

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote: > Then we might have to get rid of hint bits. But they're hint bits for > a metadata file that already exists, creating another metadata file > doesn't solve anything. Is there any way to instrument the writes of dirty buffers from the share memo

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 8:12 PM, Josh Berkus wrote: >> The whole point of the hint bits is that it's in the same place as the data. > > Yes, but the hint bits are currently causing us trouble on several > features or potential features: Then we might have to get rid of hint bits. But they're hint

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Josh Berkus
> The whole point of the hint bits is that it's in the same place as the data. Yes, but the hint bits are currently causing us trouble on several features or potential features: * page-level CRC checks * eliminating vacuum freeze for cold data * index-only access * replication * this patch * etc

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus wrote: > Well, most of the other MVCC-in-table DBMSes simply don't deal with > large, on-disk databases.  In fact, I can't think of one which does, > currently; while MVCC has been popular for the New Databases, they're > all focused on "in-memory" datab

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Josh Berkus
> PostgreSQL > isn't the only database product that uses MVCC - not by a long shot - > and the problem of detecting whether an XID is visible to the current > snapshot can't be ours alone. So what do other people do about this? > They either don't cache the information about whether the XID is >

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mar nov 09 16:05:57 -0300 2010: > And it still allows silent data corruption, because bogusly clearing a > hint bit is, at the moment, harmless, but bogusly setting one is not. > I really have to wonder how other products handle this. PostgreSQL > isn't the

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Kenneth Marshall
On Tue, Nov 09, 2010 at 02:05:57PM -0500, Robert Haas wrote: > On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote: > > On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote: > >> So, for getting checksums, we have to offer up a few things: > >> 1) zero-copy writes, we need to buffer the write to get

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Robert Haas
On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote: > On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote: >> So, for getting checksums, we have to offer up a few things: >> 1) zero-copy writes, we need to buffer the write to get a consistent >> checksum (or lock the buffer tight) >> 2) saving hin

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote: > So, for getting checksums, we have to offer up a few things: > 1) zero-copy writes, we need to buffer the write to get a consistent > checksum (or lock the buffer tight) > 2) saving hint-bits on an otherwise unchanged page.  We either need to

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Tom Lane
Gurjeet Singh writes: > On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane wrote: >> IMO there are a lot of methods that can separate filesystem misfeasance >> from Postgres errors, probably with greater reliability than this hack. > Doing this postmortem on a regular deployment and fixing the problem wo

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk
On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby wrote: >> Huh, this implies that if we did go through all the work of >> segregating the hint bits and could arrange that they all appear on >> the same 512-byte sector and if we buffered them so that we were >> writing the same bits we checksummed then

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 4:26 PM, Jim Nasby wrote: >> On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote: >>> Oh, I'm mistaken. The problem was that buffering the writes was >>> insufficient to deal with torn pages. Even if you buffer the writes if >>> the machine crashes while only having written ha

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Gurjeet Singh
On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane wrote: > There are also crosschecks that you can apply: if it's a heap page, are > there any index pages with pointers to it? If it's an index page, are > there downlink or sibling links to it from elsewhere in the index? > A page that Postgres left as z

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Jim Nasby
On Nov 9, 2010, at 9:27 AM, Greg Stark wrote: > On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote: >> Oh, I'm mistaken. The problem was that buffering the writes was >> insufficient to deal with torn pages. Even if you buffer the writes if >> the machine crashes while only having written half the b

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote: > Oh, I'm mistaken. The problem was that buffering the writes was > insufficient to deal with torn pages. Even if you buffer the writes if > the machine crashes while only having written half the buffer out then > the checksum won't match. If the o

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Tue, Nov 9, 2010 at 2:28 PM, Aidan Van Dyk wrote: > On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark wrote: > >> But buffering the page only means you've got some consistent view of >> the page. It doesn't mean the checksum will actually match the data in >> the page that gets written out. So when y

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Aidan Van Dyk
On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark wrote: > But buffering the page only means you've got some consistent view of > the page. It doesn't mean the checksum will actually match the data in > the page that gets written out. So when you read it back in the > checksum may be invalid. I was ass

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-09 Thread Greg Stark
On Mon, Nov 8, 2010 at 5:59 PM, Aidan Van Dyk wrote: > The problem that putting checksums in a different place solves is the > page layout (binary upgrade) problem.  You're still doing to need to > "buffer" the page as you calculate the checksum and write it out. > buffering that page is absolutel

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Aidan Van Dyk
On Mon, Nov 8, 2010 at 12:53 PM, Greg Stark wrote: > On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane wrote: >> So maybe Aidan's got a good idea here.  It would sure be a lot easier >> to shoehorn checksum checking in as an optional feature if the checksums >> were kept someplace else. > > Would it? I th

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Greg Stark
On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane wrote: > So maybe Aidan's got a good idea here.  It would sure be a lot easier > to shoehorn checksum checking in as an optional feature if the checksums > were kept someplace else. Would it? I thought the only problem was the hint bits being set behind th

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Tom Lane
I wrote: > Aidan Van Dyk writes: >> Getting back to the checksum debate (and this seems like a >> semi-version of the checksum debate), now that we have forks, could we >> easily add block checksumming to a fork? > More generally, this re-opens the question of whether data in secondary > forks is

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Tom Lane
Gurjeet Singh writes: > On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane wrote: >> Um ... and exactly how does that differ from the existing behavior? > Right now a zero filled page considered valid, and is treated as a new page; > PageHeaderIsValid()->/* Check all-zeroes case */, and PageIsNew(). This

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Tom Lane
Aidan Van Dyk writes: > Getting back to the checksum debate (and this seems like a > semi-version of the checksum debate), now that we have forks, could we > easily add block checksumming to a fork? IT would mean writing to 2 > files but that shouldn't be a problem, because until the checkpoint i

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-08 Thread Aidan Van Dyk
On Sun, Nov 7, 2010 at 1:04 AM, Greg Stark wrote: > It does seem like this is kind of part and parcel of adding checksums > to blocks. It's arguably kind of silly to add checksums to blocks but > have an commonly produced bitpattern in corruption cases go > undetected. Getting back to the checksu

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-06 Thread Greg Stark
On Sun, Nov 7, 2010 at 4:23 AM, Gurjeet Singh wrote: > I understand that it is a pretty low-level change, but IMHO the change is > minimal and is being applied in well understood places. All the assumptions > listed have been effective for quite a while, and I don't see these > assumptions being a

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-06 Thread Gurjeet Singh
On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane wrote: > Gurjeet Singh writes: > > .) The basic idea is to have a magic number in every PageHeader before it > is > > written to disk, and check for this magic number when performing page > > validity > > checks. > > Um ... and exactly how does that diff

Re: [HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-06 Thread Tom Lane
Gurjeet Singh writes: > .) The basic idea is to have a magic number in every PageHeader before it is > written to disk, and check for this magic number when performing page > validity > checks. Um ... and exactly how does that differ from the existing behavior? > .) To avoid adding a new field t

[HACKERS] Protecting against unexpected zero-pages: proposal

2010-11-06 Thread Gurjeet Singh
A customer of ours is quite bothered about finding zero pages in an index after a system crash. The task now is to improve the diagnosability of such an issue and be able to definitively point to the source of zero pages. The proposed solution below has been vetted in-house at EnterpriseDB and am