On Tue, Nov 9, 2010 at 7:04 PM, Robert Haas wrote:
> On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus wrote:
>> Robert,
>>
>>> Uh, no it doesn't. It only requires you to be more aggressive about
>>> vacuuming the transactions that are in the aborted-XIDs array. It
>>> doesn't affect transaction wrap
On Tue, Nov 9, 2010 at 6:42 PM, Tom Lane wrote:
> Robert Haas writes:
>>
>> 4. There would presumably be some finite limit on the size of the
>> shared memory structure for aborted transactions. I don't think
>> there'd be any reason to make it particularly small, but if you sat
>> there and ab
On Wed, Nov 10, 2010 at 1:15 AM, Tom Lane wrote:
> Once you know that there is, or isn't,
> a filesystem-level error involved, what are you going to do next?
> You're going to go try to debug the component you know is at fault,
> that's what. And that problem is still AI-complete.
>
>
If we know
On Tue, Nov 9, 2010 at 5:45 PM, Josh Berkus wrote:
> Robert,
>
>> Uh, no it doesn't. It only requires you to be more aggressive about
>> vacuuming the transactions that are in the aborted-XIDs array. It
>> doesn't affect transaction wraparound vacuuming at all, either
>> positively or negatively
Robert Haas writes:
>
> 4. There would presumably be some finite limit on the size of the
> shared memory structure for aborted transactions. I don't think
> there'd be any reason to make it particularly small, but if you sat
> there and aborted transactions at top speed you might eventually run
Josh Berkus writes:
>> Though incidentally all of the other items you mentioned are generic
>> problems caused by with MVCC, not hint bits.
> Yes, but the hint bits prevent us from implementing workarounds.
If we got rid of hint bits, we'd need workarounds for the ensuing
massive performance los
Robert,
> Uh, no it doesn't. It only requires you to be more aggressive about
> vacuuming the transactions that are in the aborted-XIDs array. It
> doesn't affect transaction wraparound vacuuming at all, either
> positively or negatively. You still have to freeze xmins before they
> flip from b
On Tue, Nov 9, 2010 at 3:05 PM, Greg Stark wrote:
> On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus wrote:
>> Well, most of the other MVCC-in-table DBMSes simply don't deal with
>> large, on-disk databases. In fact, I can't think of one which does,
>> currently; while MVCC has been popular for the N
On Tue, Nov 9, 2010 at 5:15 PM, Kevin Grittner
wrote:
> Josh Berkus wrote:
>
>> 6. This would require us to be more aggressive about VACUUMing
>> old-cold relations/page, e.g. VACUUM FREEZE. This it would make
>> one of our worst issues for data warehousing even worse.
>
> I continue to feel tha
On Tue, Nov 9, 2010 at 5:03 PM, Josh Berkus wrote:
> On 11/9/10 1:50 PM, Robert Haas wrote:
>> 5. It would be pretty much impossible to run with autovacuum turned
>> off, and in fact you would likely need to make it a good deal more
>> aggressive in the specific case of aborted transactions, to mi
Josh Berkus wrote:
> 6. This would require us to be more aggressive about VACUUMing
> old-cold relations/page, e.g. VACUUM FREEZE. This it would make
> one of our worst issues for data warehousing even worse.
I continue to feel that it is insane that when a table is populated
within the same
On 11/9/10 1:50 PM, Robert Haas wrote:
> 5. It would be pretty much impossible to run with autovacuum turned
> off, and in fact you would likely need to make it a good deal more
> aggressive in the specific case of aborted transactions, to mitigate
> problems #1, #3, and #4.
6. This would require
On Tue, Nov 9, 2010 at 2:05 PM, Robert Haas wrote:
> On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote:
>> On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote:
>>> So, for getting checksums, we have to offer up a few things:
>>> 1) zero-copy writes, we need to buffer the write to get a consisten
> Though incidentally all of the other items you mentioned are generic
> problems caused by with MVCC, not hint bits.
Yes, but the hint bits prevent us from implementing workarounds.
--
-- Josh Berkus
PostgreSQL Experts Inc
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote:
> Then we might have to get rid of hint bits. But they're hint bits for
> a metadata file that already exists, creating another metadata file
> doesn't solve anything.
Is there any way to instrument the writes of dirty buffers from the
share memo
On Tue, Nov 9, 2010 at 8:12 PM, Josh Berkus wrote:
>> The whole point of the hint bits is that it's in the same place as the data.
>
> Yes, but the hint bits are currently causing us trouble on several
> features or potential features:
Then we might have to get rid of hint bits. But they're hint
> The whole point of the hint bits is that it's in the same place as the data.
Yes, but the hint bits are currently causing us trouble on several
features or potential features:
* page-level CRC checks
* eliminating vacuum freeze for cold data
* index-only access
* replication
* this patch
* etc
On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus wrote:
> Well, most of the other MVCC-in-table DBMSes simply don't deal with
> large, on-disk databases. In fact, I can't think of one which does,
> currently; while MVCC has been popular for the New Databases, they're
> all focused on "in-memory" datab
> PostgreSQL
> isn't the only database product that uses MVCC - not by a long shot -
> and the problem of detecting whether an XID is visible to the current
> snapshot can't be ours alone. So what do other people do about this?
> They either don't cache the information about whether the XID is
>
Excerpts from Robert Haas's message of mar nov 09 16:05:57 -0300 2010:
> And it still allows silent data corruption, because bogusly clearing a
> hint bit is, at the moment, harmless, but bogusly setting one is not.
> I really have to wonder how other products handle this. PostgreSQL
> isn't the
On Tue, Nov 09, 2010 at 02:05:57PM -0500, Robert Haas wrote:
> On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote:
> > On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote:
> >> So, for getting checksums, we have to offer up a few things:
> >> 1) zero-copy writes, we need to buffer the write to get
On Tue, Nov 9, 2010 at 12:31 PM, Greg Stark wrote:
> On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote:
>> So, for getting checksums, we have to offer up a few things:
>> 1) zero-copy writes, we need to buffer the write to get a consistent
>> checksum (or lock the buffer tight)
>> 2) saving hin
On Tue, Nov 9, 2010 at 5:06 PM, Aidan Van Dyk wrote:
> So, for getting checksums, we have to offer up a few things:
> 1) zero-copy writes, we need to buffer the write to get a consistent
> checksum (or lock the buffer tight)
> 2) saving hint-bits on an otherwise unchanged page. We either need to
Gurjeet Singh writes:
> On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane wrote:
>> IMO there are a lot of methods that can separate filesystem misfeasance
>> from Postgres errors, probably with greater reliability than this hack.
> Doing this postmortem on a regular deployment and fixing the problem wo
On Tue, Nov 9, 2010 at 11:26 AM, Jim Nasby wrote:
>> Huh, this implies that if we did go through all the work of
>> segregating the hint bits and could arrange that they all appear on
>> the same 512-byte sector and if we buffered them so that we were
>> writing the same bits we checksummed then
On Tue, Nov 9, 2010 at 4:26 PM, Jim Nasby wrote:
>> On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote:
>>> Oh, I'm mistaken. The problem was that buffering the writes was
>>> insufficient to deal with torn pages. Even if you buffer the writes if
>>> the machine crashes while only having written ha
On Tue, Nov 9, 2010 at 12:32 AM, Tom Lane wrote:
> There are also crosschecks that you can apply: if it's a heap page, are
> there any index pages with pointers to it? If it's an index page, are
> there downlink or sibling links to it from elsewhere in the index?
> A page that Postgres left as z
On Nov 9, 2010, at 9:27 AM, Greg Stark wrote:
> On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote:
>> Oh, I'm mistaken. The problem was that buffering the writes was
>> insufficient to deal with torn pages. Even if you buffer the writes if
>> the machine crashes while only having written half the b
On Tue, Nov 9, 2010 at 3:25 PM, Greg Stark wrote:
> Oh, I'm mistaken. The problem was that buffering the writes was
> insufficient to deal with torn pages. Even if you buffer the writes if
> the machine crashes while only having written half the buffer out then
> the checksum won't match. If the o
On Tue, Nov 9, 2010 at 2:28 PM, Aidan Van Dyk wrote:
> On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark wrote:
>
>> But buffering the page only means you've got some consistent view of
>> the page. It doesn't mean the checksum will actually match the data in
>> the page that gets written out. So when y
On Tue, Nov 9, 2010 at 8:45 AM, Greg Stark wrote:
> But buffering the page only means you've got some consistent view of
> the page. It doesn't mean the checksum will actually match the data in
> the page that gets written out. So when you read it back in the
> checksum may be invalid.
I was ass
On Mon, Nov 8, 2010 at 5:59 PM, Aidan Van Dyk wrote:
> The problem that putting checksums in a different place solves is the
> page layout (binary upgrade) problem. You're still doing to need to
> "buffer" the page as you calculate the checksum and write it out.
> buffering that page is absolutel
On Mon, Nov 8, 2010 at 12:53 PM, Greg Stark wrote:
> On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane wrote:
>> So maybe Aidan's got a good idea here. It would sure be a lot easier
>> to shoehorn checksum checking in as an optional feature if the checksums
>> were kept someplace else.
>
> Would it? I th
On Mon, Nov 8, 2010 at 5:00 PM, Tom Lane wrote:
> So maybe Aidan's got a good idea here. It would sure be a lot easier
> to shoehorn checksum checking in as an optional feature if the checksums
> were kept someplace else.
Would it? I thought the only problem was the hint bits being set
behind th
I wrote:
> Aidan Van Dyk writes:
>> Getting back to the checksum debate (and this seems like a
>> semi-version of the checksum debate), now that we have forks, could we
>> easily add block checksumming to a fork?
> More generally, this re-opens the question of whether data in secondary
> forks is
Gurjeet Singh writes:
> On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane wrote:
>> Um ... and exactly how does that differ from the existing behavior?
> Right now a zero filled page considered valid, and is treated as a new page;
> PageHeaderIsValid()->/* Check all-zeroes case */, and PageIsNew(). This
Aidan Van Dyk writes:
> Getting back to the checksum debate (and this seems like a
> semi-version of the checksum debate), now that we have forks, could we
> easily add block checksumming to a fork? IT would mean writing to 2
> files but that shouldn't be a problem, because until the checkpoint i
On Sun, Nov 7, 2010 at 1:04 AM, Greg Stark wrote:
> It does seem like this is kind of part and parcel of adding checksums
> to blocks. It's arguably kind of silly to add checksums to blocks but
> have an commonly produced bitpattern in corruption cases go
> undetected.
Getting back to the checksu
On Sun, Nov 7, 2010 at 4:23 AM, Gurjeet Singh wrote:
> I understand that it is a pretty low-level change, but IMHO the change is
> minimal and is being applied in well understood places. All the assumptions
> listed have been effective for quite a while, and I don't see these
> assumptions being a
On Sat, Nov 6, 2010 at 11:48 PM, Tom Lane wrote:
> Gurjeet Singh writes:
> > .) The basic idea is to have a magic number in every PageHeader before it
> is
> > written to disk, and check for this magic number when performing page
> > validity
> > checks.
>
> Um ... and exactly how does that diff
Gurjeet Singh writes:
> .) The basic idea is to have a magic number in every PageHeader before it is
> written to disk, and check for this magic number when performing page
> validity
> checks.
Um ... and exactly how does that differ from the existing behavior?
> .) To avoid adding a new field t
A customer of ours is quite bothered about finding zero pages in an index
after
a system crash. The task now is to improve the diagnosability of such an
issue
and be able to definitively point to the source of zero pages.
The proposed solution below has been vetted in-house at EnterpriseDB and am
42 matches
Mail list logo