Re: [HACKERS] Page Checksums

2012-01-24 Thread Jim Nasby
On Jan 24, 2012, at 9:15 AM, Simon Riggs wrote: > On Tue, Jan 24, 2012 at 2:49 PM, Robert Treat wrote: >>> And yes, I would for sure turn such functionality on if it were present. >>> >> >> That's nice to say, but most people aren't willing to take a 50% >> performance hit. Not saying what we en

Re: [HACKERS] Page Checksums

2012-01-24 Thread Simon Riggs
On Tue, Jan 24, 2012 at 2:49 PM, Robert Treat wrote: >> And yes, I would for sure turn such functionality on if it were present. >> > > That's nice to say, but most people aren't willing to take a 50% > performance hit. Not saying what we end up with will be that bad, but > I've seen people get up

Re: [HACKERS] Page Checksums

2012-01-24 Thread Robert Treat
On Tue, Jan 24, 2012 at 3:02 AM,   wrote: >> * Robert Treat: >> >>> Would it be unfair to assert that people who want checksums but aren't >>> willing to pay the cost of running a filesystem that provides >>> checksums aren't going to be willing to make the cost/benefit trade >>> off that will be a

Re: [HACKERS] Page Checksums

2012-01-24 Thread Florian Weimer
> I would chip in and say that I would prefer sticking to well-known proved > filesystems like xfs/ext4 and let the application do the checksumming. Yes, that's a different way of putting my concern. If you want a proven file system with checksumming (and an fsck), options are really quite limite

Re: [HACKERS] Page Checksums

2012-01-24 Thread jesper
> * Robert Treat: > >> Would it be unfair to assert that people who want checksums but aren't >> willing to pay the cost of running a filesystem that provides >> checksums aren't going to be willing to make the cost/benefit trade >> off that will be asked for? Yes, it is unfair of course, but it's

Re: [HACKERS] Page Checksums

2012-01-23 Thread Florian Weimer
* Robert Treat: > Would it be unfair to assert that people who want checksums but aren't > willing to pay the cost of running a filesystem that provides > checksums aren't going to be willing to make the cost/benefit trade > off that will be asked for? Yes, it is unfair of course, but it's > inter

Re: [HACKERS] Page Checksums

2012-01-23 Thread Robert Treat
On Sat, Jan 21, 2012 at 6:12 PM, Jim Nasby wrote: > On Jan 10, 2012, at 3:07 AM, Simon Riggs wrote: >> I think we could add an option to check the checksum immediately after >> we pin a block for the first time but it would be very expensive and >> sounds like we're re-inventing hardware or OS fea

Re: [HACKERS] Page Checksums

2012-01-22 Thread Jim Nasby
On Jan 10, 2012, at 3:07 AM, Simon Riggs wrote: > I think we could add an option to check the checksum immediately after > we pin a block for the first time but it would be very expensive and > sounds like we're re-inventing hardware or OS features again. Work on > 50% performance drain, as an esti

Re: [HACKERS] Page Checksums

2012-01-10 Thread Benedikt Grundmann
On 10/01/12 09:07, Simon Riggs wrote: > > You can repeat that argument ad infinitum. Even if the CRC covers all the > > pages in the OS buffer cache, it still doesn't cover the pages in the > > shared_buffers, CPU caches, in-transit from one memory bank to another etc. > > You have to draw the line

Re: [HACKERS] Page Checksums

2012-01-10 Thread Simon Riggs
On Tue, Jan 10, 2012 at 8:04 AM, Heikki Linnakangas wrote: > On 10.01.2012 02:12, Jim Nasby wrote: >> >> Filesystem CRCs very likely will not happen to data that's in the cache. >> For some users, that's a huge amount of data to leave un-protected. > > > You can repeat that argument ad infinitum.

Re: [HACKERS] Page Checksums

2012-01-10 Thread Heikki Linnakangas
On 10.01.2012 02:12, Jim Nasby wrote: Filesystem CRCs very likely will not happen to data that's in the cache. For some users, that's a huge amount of data to leave un-protected. You can repeat that argument ad infinitum. Even if the CRC covers all the pages in the OS buffer cache, it still d

Re: [HACKERS] Page Checksums

2012-01-09 Thread Jim Nasby
On Jan 8, 2012, at 5:25 PM, Simon Riggs wrote: > On Mon, Dec 19, 2011 at 8:18 PM, Heikki Linnakangas > wrote: > >> Double-writes would be a useful option also to reduce the size of WAL that >> needs to be shipped in replication. >> >> Or you could just use a filesystem that does CRCs... > > Dou

Re: [HACKERS] Page Checksums

2012-01-08 Thread Simon Riggs
On Mon, Dec 19, 2011 at 8:18 PM, Heikki Linnakangas wrote: > Double-writes would be a useful option also to reduce the size of WAL that > needs to be shipped in replication. > > Or you could just use a filesystem that does CRCs... Double writes would reduce the size of WAL and we discussed many

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Kevin Grittner
Benedikt Grundmann wrote: > For what's worth here are the numbers on one of our biggest > databases (same system as I posted about separately wrt > seq_scan_cost vs random_page_cost). That's would be a 88.4% hit rate on the summarized data. -Kevin -- Sent via pgsql-hackers mailing list (pg

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Benedikt Grundmann
For what's worth here are the numbers on one of our biggest databases (same system as I posted about separately wrt seq_scan_cost vs random_page_cost). 0053 1001 00BA 1009 0055 1001 00B9 1020 0054 983 00BB 1010 0056 1001 00BC 1019 0069 0 00BD 1009 006A 224 00BE 1018 006B 1009 00BF 1008 006C 1008

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 6:15 AM, Florian Pflug wrote: > On 64-bit machines at least, we could simply mmap() the stable parts of the > CLOG into the backend address space, and access it without any locking at all. True. I think this could be done, but it would take some fairly careful thought and

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 5:15 AM, Florian Pflug wrote: > On Jan4, 2012, at 21:27 , Robert Haas wrote: >> I think the first thing we need to look at is increasing the number of >> CLOG buffers. > > What became of the idea to treat the stable (i.e. earlier than the oldest > active xid) and the unstabl

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Florian Pflug
On Jan4, 2012, at 21:27 , Robert Haas wrote: > I think the first thing we need to look at is increasing the number of > CLOG buffers. What became of the idea to treat the stable (i.e. earlier than the oldest active xid) and the unstable (i.e. the rest) parts of the CLOG differently. On 64-bit mac

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 4:02 PM, Kevin Grittner wrote: > Robert Haas wrote: > >> 2. The CLOG code isn't designed to manage a large number of >> buffers, so adding more might cause a performance regression on >> small systems. >> >> On Nate Boley's 32-core system, running pgbench at scale factor >>

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby
On Jan 4, 2012, at 2:02 PM, Kevin Grittner wrote: > Jim Nasby wrote: >> Here's output from our largest OLTP system... not sure exactly how >> to interpret it, so I'm just providing the raw data. This spans >> almost exactly 1 month. > > Those number wind up meaning that 18% of the 256-byte blocks

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Robert Haas wrote: > 2. The CLOG code isn't designed to manage a large number of > buffers, so adding more might cause a performance regression on > small systems. > > On Nate Boley's 32-core system, running pgbench at scale factor > 100, the optimal number of buffers seems to be around 32. I'

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 3:02 PM, Kevin Grittner wrote: > Jim Nasby wrote: >> Here's output from our largest OLTP system... not sure exactly how >> to interpret it, so I'm just providing the raw data. This spans >> almost exactly 1 month. > > Those number wind up meaning that 18% of the 256-byte bl

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Jim Nasby wrote: > Here's output from our largest OLTP system... not sure exactly how > to interpret it, so I'm just providing the raw data. This spans > almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte blocks (1024 transactions each) were all commits. Yikes. Tha

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby
On Dec 23, 2011, at 2:23 PM, Kevin Grittner wrote: > Jeff Janes wrote: > >> Could we get some major OLTP users to post their CLOG for >> analysis? I wouldn't think there would be much >> security/propietary issues with CLOG data. > > FWIW, I got the raw numbers to do my quick check using this

Re: [HACKERS] Page Checksums

2012-01-03 Thread Jim Nasby
On Dec 28, 2011, at 3:31 AM, Simon Riggs wrote: > On Wed, Dec 28, 2011 at 9:00 AM, Robert Haas wrote: > >> What I'm not too clear >> about is whether a 16-bit checksum meets the needs of people who want >> checksums. > > We need this now, hence the gymnastics to get it into this release. > > 1

Re: [HACKERS] Page Checksums

2011-12-28 Thread Heikki Linnakangas
On 28.12.2011 11:00, Robert Haas wrote: Admittedly, most of the fat is probably in the tuple header rather than the page header, but at any rate I don't consider burning up 1% of our available storage space to be a negligible overhead. 8 / 8192 = 0.1%. -- Heikki Linnakangas EnterpriseDB

Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Merlin Moncure
On Wed, Dec 28, 2011 at 8:45 AM, Greg Stark wrote: > On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure wrote: >>  I bet if you kept a judicious number of >> clog pages in each local process with some smart invalidation you >> could cover enough cases that scribbling the bits down would become >> u

Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Greg Stark
On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure wrote: > I bet if you kept a judicious number of > clog pages in each local process with some smart invalidation you > could cover enough cases that scribbling the bits down would become > unnecessary. I don't understand how any cache can complete

Re: [HACKERS] Page Checksums

2011-12-28 Thread Simon Riggs
On Wed, Dec 28, 2011 at 9:00 AM, Robert Haas wrote: > What I'm not too clear > about is whether a 16-bit checksum meets the needs of people who want > checksums. We need this now, hence the gymnastics to get it into this release. 16-bits of checksum is way better than zero bits of checksum, pro

Re: [HACKERS] Page Checksums

2011-12-28 Thread Robert Haas
On Tue, Dec 27, 2011 at 1:39 PM, Jeff Davis wrote: > On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote: >> I >> think it would be regrettable if everyone had to give up 4 bytes per >> page because some people want checksums. > > I can understand that some people might not want the CPU expense o

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Tue, 2011-12-27 at 16:43 -0600, Merlin Moncure wrote: > On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis wrote: > > 3. Attack hint bits problem. > > A large number of problems would go away if the current hint bit > system could be replaced with something that did not require writing > to the tuple

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis wrote: > 3. Attack hint bits problem. A large number of problems would go away if the current hint bit system could be replaced with something that did not require writing to the tuple itself. FWIW, moving the bits around seems like a non-starter -- yo

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Thu, 2011-12-22 at 03:50 -0600, Kevin Grittner wrote: > Now, on to the separate-but-related topic of double-write. That > absolutely requires some form of checksum or CRC to detect torn > pages, in order for the technique to work at all. Adding a CRC > without double-write would work fine if y

Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Sun, 2011-12-25 at 22:18 +, Greg Stark wrote: > 2) The i/o system was in the process of writing out blocks and the > system lost power or crashed as they were being written out. In this > case there will probably only be 0 or 1 torn pages -- perhaps as many > as the scsi queue depth if there

Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 22:18 +0200, Heikki Linnakangas wrote: > Or you could just use a filesystem that does CRCs... That just moves the problem. Correct me if I'm wrong, but I don't think there's anything special that the filesystem can do that we can't. The filesystems that support CRCs are more

Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 01:55 +, Greg Stark wrote: > On Sun, Dec 18, 2011 at 7:51 PM, Jesper Krogh wrote: > > I dont know if it would be seen as a "half baked feature".. or similar, > > and I dont know if the hint bit problem is solvable at all, but I could > > easily imagine checksumming just "

Re: [HACKERS] Page Checksums

2011-12-27 Thread Jeff Davis
On Mon, 2011-12-19 at 07:50 -0500, Robert Haas wrote: > I > think it would be regrettable if everyone had to give up 4 bytes per > page because some people want checksums. I can understand that some people might not want the CPU expense of calculating CRCs; or the upgrade expense to convert to new

Re: [HACKERS] Page Checksums

2011-12-25 Thread Greg Stark
On Mon, Dec 19, 2011 at 7:16 PM, Kevin Grittner wrote: > It seems to me that on a typical production system you would > probably have zero or one such page per OS crash Incidentally I don't think this is right. There are really two kinds of torn pages: 1) The kernel vm has many dirty 4k pages an

Re: [HACKERS] Page Checksums + Double Writes

2011-12-24 Thread Simon Riggs
On Thu, Dec 22, 2011 at 9:58 PM, Simon Riggs wrote: > On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner > wrote: > >> Simon, does it sound like I understand your proposal? > > Yes, thanks for restating. I've implemented that proposal, posting patch on a separate thread. --  Simon Riggs  

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Tom Lane
Jeff Janes writes: > I had a perhaps crazier idea. Aren't CLOG pages older than global xmin > effectively read only? Could backends that need these bypass locking > and shared memory altogether? Hmm ... once they've been written out from the SLRU arena, yes. In fact you don't need to go back as

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
Jeff Janes wrote: > Could we get some major OLTP users to post their CLOG for > analysis? I wouldn't think there would be much > security/propietary issues with CLOG data. FWIW, I got the raw numbers to do my quick check using this Ruby script (put together for me by Peter Brant). If it is o

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
Tom Lane wrote: > Robert Haas writes: >> An obvious problem is that, if the abort rate is significantly >> different from zero, and especially if the aborts are randomly >> mixed in with commits rather than clustered together in small >> portions of the XID space, the CLOG rollup data would becom

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Jeff Janes
On 12/23/11, Robert Haas wrote: > On Fri, Dec 23, 2011 at 11:14 AM, Kevin Grittner > wrote: >> Thoughts? > > Those are good thoughts. > > Here's another random idea, which might be completely nuts. Maybe we > could consider some kind of summarization of CLOG data, based on the > idea that most t

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Robert Haas
On Fri, Dec 23, 2011 at 12:42 PM, Tom Lane wrote: > Robert Haas writes: >> An obvious problem is that, if the abort rate is significantly >> different from zero, and especially if the aborts are randomly mixed >> in with commits rather than clustered together in small portions of >> the XID space

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Tom Lane
Robert Haas writes: > An obvious problem is that, if the abort rate is significantly > different from zero, and especially if the aborts are randomly mixed > in with commits rather than clustered together in small portions of > the XID space, the CLOG rollup data would become useless. Yeah, I'm a

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Robert Haas
On Fri, Dec 23, 2011 at 11:14 AM, Kevin Grittner wrote: > Thoughts? Those are good thoughts. Here's another random idea, which might be completely nuts. Maybe we could consider some kind of summarization of CLOG data, based on the idea that most transactions commit. We introduce the idea of a

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
"Kevin Grittner" wrote: >> I would suggest you examine how to have an array of N bgwriters, >> then just slot the code for hinting into the bgwriter. That way a >> bgwriter can set hints, calc CRC and write pages in sequence on a >> particular block. The hinting needs to be synchronised with the

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs wrote: > It could work that way, but I seriously doubt that a technique > only mentioned in dispatches one month before the last CF is > likely to become trustable code within one month. We've been > discussing CRCs for years, so assembling the puzzle seems much > easier, when all th

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner wrote: > Simon, does it sound like I understand your proposal? Yes, thanks for restating. > Now, on to the separate-but-related topic of double-write.  That > absolutely requires some form of checksum or CRC to detect torn > pages, in order for th

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 3:04 PM, Robert Haas wrote: > On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah wrote: >> In the double write implementation, every checkpoint write is double >> writed, > > Unless I'm quite thoroughly confused, which is possible, the double > write will need to happen the fir

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah wrote: > In the double write implementation, every checkpoint write is double > writed, Unless I'm quite thoroughly confused, which is possible, the double write will need to happen the first time a buffer is written following each checkpoint. Which

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 11:16 AM, Kevin Grittner wrote: > Jignesh Shah wrote: > >> When we use Doublewrite with checksums, we can safely disable >> full_page_write causing a HUGE reduction to the WAL traffic >> without loss of reliatbility due to a write fault since there are >> two writes always

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Jignesh Shah wrote: > When we use Doublewrite with checksums, we can safely disable > full_page_write causing a HUGE reduction to the WAL traffic > without loss of reliatbility due to a write fault since there are > two writes always. (Implementation detail discussable). The "always" there sur

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh wrote: > On 2011-12-22 09:42, Florian Weimer wrote: >> >> * David Fetter: >> >>> The issue is that double writes needs a checksum to work by itself, >>> and page checksums more broadly work better when there are double >>> writes, obviating the need to

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs wrote: > So overall, I do now think its still possible to add an optional > checksum in the 9.2 release and am willing to pursue it unless > there are technical objections. Just to restate Simon's proposal, to make sure I'm understanding it, we would support a new page header forma

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jesper Krogh
On 2011-12-22 09:42, Florian Weimer wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writ

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 8:42 AM, Florian Weimer wrote: > * David Fetter: > >> The issue is that double writes needs a checksum to work by itself, >> and page checksums more broadly work better when there are double >> writes, obviating the need to have full_page_writes on. > > How desirable is it

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 7:44 AM, Heikki Linnakangas wrote: > On 22.12.2011 01:43, Tom Lane wrote: >> >> A "utility to bump the page version" is equally a whole lot easier said >> than done, given that the new version has more overhead space and thus >> less payload space than the old.  What does i

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Florian Weimer
* David Fetter: > The issue is that double writes needs a checksum to work by itself, > and page checksums more broadly work better when there are double > writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time s

Re: [HACKERS] Page Checksums

2011-12-22 Thread Leonardo Francalanci
Agreed. I do agree with Heikki that it really ought to be the OS problem, but then we thought that about dtrace and we're still waiting for that or similar to be usable on all platforms (+/- 4 years). My point is that it looks like this is going to take 1-2 years in postgresql, so it looks li

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Heikki Linnakangas
On 22.12.2011 01:43, Tom Lane wrote: A "utility to bump the page version" is equally a whole lot easier said than done, given that the new version has more overhead space and thus less payload space than the old. What does it do when the old page is too full to be converted? "Move some data som

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Thu, Dec 22, 2011 at 12:06 AM, Simon Riggs wrote: >> Having two different page formats running around in the system at the >> same time is far from free; in the worst case it means that every single >> piece of code that touches pages has to know about and be prepared to >> cope with both vers

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread David Fetter
On Wed, Dec 21, 2011 at 04:18:33PM -0800, Rob Wultsch wrote: > On Wed, Dec 21, 2011 at 1:59 PM, David Fetter wrote: > > One of the things VMware is working on is double writes, per > > previous discussions of how, for example, InnoDB does things. > > The world is moving to flash, and the lifetime

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Robert Haas
On Wed, Dec 21, 2011 at 7:06 PM, Simon Riggs wrote: > My feeling is it probably depends upon how different the formats are, > so given we are discussing a 4 byte addition to the header, it might > be doable. I agree. When thinking back on Zoltan's patches, it's worth remembering that he had a nu

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Rob Wultsch
On Wed, Dec 21, 2011 at 1:59 PM, David Fetter wrote: > One of the things VMware is working on is double writes, per previous > discussions of how, for example, InnoDB does things. The world is moving to flash, and the lifetime of flash is measured writes. Potentially doubling the number of writes

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Wed, Dec 21, 2011 at 11:43 PM, Tom Lane wrote: > It seems like you've forgotten all of the previous discussion of how > we'd manage a page format version change. Maybe I've had too much caffeine. It's certainly late here. > Having two different page formats running around in the system at th

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Tom Lane
Simon Riggs writes: > We don't need to use any flag bits at all. We add > PG_PAGE_LAYOUT_VERSION to the control file, so that CRC checking > becomes an initdb option. All new pages can be created with > PG_PAGE_LAYOUT_VERSION from the control file. All existing pages must > be either the layout ve

Re: [HACKERS] Page Checksums

2011-12-21 Thread Simon Riggs
On Wed, Dec 21, 2011 at 7:35 PM, Greg Smith wrote: > And there's even more radical changes in btrfs, since it wasn't starting > with a fairly robust filesystem as a base.  And putting my tin foil hat on, > I don't feel real happy about assuming *the* solution for this issue in > PostgreSQL is the

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Tom Lane
David Fetter writes: > There's a separate issue we'd like to get clear on, which is whether > it would be OK to make a new PG_PAGE_LAYOUT_VERSION. If you're not going to provide pg_upgrade support, I think there is no chance of getting a new page layout accepted. The people who might want CRC su

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Wed, Dec 21, 2011 at 10:19 PM, Kevin Grittner wrote: > Alvaro Herrera wrote: > >> If you get away with a new page format, let's make sure and >> coordinate so that we can add more info into the header.  One >> thing I wanted was to have an ID struct on each file, so that you >> know what DB/re

Re: [HACKERS] Page Checksums

2011-12-21 Thread Martijn van Oosterhout
On Wed, Dec 21, 2011 at 09:32:28AM +0100, Leonardo Francalanci wrote: > I can't help in this discussion, but I have a question: > how different would this feature be from filesystem-level CRC, such > as the one available in ZFS and btrfs? Hmm, filesystems are not magical. If they implement this th

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Kevin Grittner
Alvaro Herrera wrote: > If you get away with a new page format, let's make sure and > coordinate so that we can add more info into the header. One > thing I wanted was to have an ID struct on each file, so that you > know what DB/relation/segment the file corresponds to. So the > first page's

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Alvaro Herrera
Excerpts from David Fetter's message of mié dic 21 18:59:13 -0300 2011: > If not, we'll have to do some extra work on the patch as described > below. Thanks to Kevin Grittner for coming up with this :) > > - Use a header bit to say whether we've got a checksum on the page. > We're using 3/16

[HACKERS] Page Checksums + Double Writes

2011-12-21 Thread David Fetter
Folks, One of the things VMware is working on is double writes, per previous discussions of how, for example, InnoDB does things. I'd initially thought that introducing just one of the features in $Subject at a time would help, but I'm starting to see a mutual dependency. The issue is that doub

Re: [HACKERS] Page Checksums

2011-12-21 Thread Greg Smith
On 12/21/2011 10:49 AM, Stephen Frost wrote: * Leonardo Francalanci (m_li...@yahoo.it) wrote: I think what I meant was: isn't this going to be useless in a couple of years (if, say, btrfs will be available)? Or it actually gives something that FS will never be able to give? Yes, it wi

Re: [HACKERS] Page Checksums

2011-12-21 Thread Tom Lane
Heikki Linnakangas writes: > 4 bytes out of a 8k block is just under 0.05%. I don't think anyone is > going to notice the extra disk space consumed by this. There's all those > other issues like the hint bits that make this a non-starter, but disk > space overhead is not one of them. The bigge

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I think what I meant was: isn't this going to be useless in a couple of years (if, say, btrfs will be available)? Or it actually gives something that FS will never be able to give? Yes, it will help you find/address bugs in the filesystem. These things are not unheard of... It sounds to me li

Re: [HACKERS] Page Checksums

2011-12-21 Thread Stephen Frost
* Leonardo Francalanci (m_li...@yahoo.it) wrote: > >Depends on how much you trust the filesystem. :) > > Ehm I hope that was a joke... It certainly wasn't.. > I think what I meant was: isn't this going to be useless in a couple > of years (if, say, btrfs will be available)? Or it actually gives

Re: [HACKERS] Page Checksums

2011-12-21 Thread Robert Haas
On Tue, Dec 20, 2011 at 12:12 PM, Christopher Browne wrote: > This seems to be a frequent problem with this whole "doing CRCs on pages" > thing. > > It's not evident which problems will be "real" ones. That depends on the implementation. If we have a flaky, broken implementation such as the one

Re: [HACKERS] Page Checksums

2011-12-21 Thread Heikki Linnakangas
On 21.12.2011 17:21, Kevin Grittner wrote: Also, I'm not sure that our shop would want to dedicate any space per page for this, since we're comparing between databases to ensure that values actually match, row by row, during idle time. 4 bytes out of a 8k block is just under 0.05%. I don't thin

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
On 21/12/2011 16.19, Stephen Frost wrote: * Leonardo Francalanci (m_li...@yahoo.it) wrote: I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? Depends on how much you trust the filesy

Re: [HACKERS] Page Checksums

2011-12-21 Thread Andres Freund
On Wednesday, December 21, 2011 04:21:53 PM Kevin Grittner wrote: > Greg Smith wrote: > >> Some people think I border on the paranoid on this issue. > > > > Those people are also out to get you, just like the hardware. > > Hah! I *knew* it! > > >> Are you arguing that autovacuum should be dis

Re: [HACKERS] Page Checksums

2011-12-21 Thread Kevin Grittner
Greg Smith wrote: >> Some people think I border on the paranoid on this issue. > > Those people are also out to get you, just like the hardware. Hah! I *knew* it! >> Are you arguing that autovacuum should be disabled after crash >> recovery? I guess if you are arguing that a database VACUU

Re: [HACKERS] Page Checksums

2011-12-21 Thread Stephen Frost
* Leonardo Francalanci (m_li...@yahoo.it) wrote: > I can't help in this discussion, but I have a question: > how different would this feature be from filesystem-level CRC, such > as the one available in ZFS and btrfs? Depends on how much you trust the filesystem. :) Stephen signature.as

Re: [HACKERS] Page Checksums

2011-12-21 Thread Leonardo Francalanci
I can't help in this discussion, but I have a question: how different would this feature be from filesystem-level CRC, such as the one available in ZFS and btrfs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.o

Re: [HACKERS] Page Checksums

2011-12-20 Thread Greg Smith
On 12/19/2011 06:14 PM, Kevin Grittner wrote: But if you need all that infrastructure just to get the feature launched, that's a bit hard to stomach. Triggering a vacuum or some hypothetical "scrubbing" feature? What you were suggesting doesn't require triggering just a vacuum tho

Re: [HACKERS] Page Checksums

2011-12-20 Thread Jesper Krogh
On 2011-12-19 02:55, Greg Stark wrote: On Sun, Dec 18, 2011 at 7:51 PM, Jesper Krogh wrote: I dont know if it would be seen as a "half baked feature".. or similar, and I dont know if the hint bit problem is solvable at all, but I could easily imagine checksumming just "skipping" the hit bit ent

Re: [HACKERS] Page Checksums

2011-12-20 Thread Jesper Krogh
On 2011-12-20 18:44, Simon Riggs wrote: On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggs wrote: The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upg

Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 06:44:48 PM Simon Riggs wrote: > Currently, setting hints can be done while holding a share lock on the > buffer. Preventing that would require us to change the way buffer > manager works to make it take an exclusive lock while writing out, > since a hint would change

Re: [HACKERS] Page Checksums

2011-12-20 Thread Simon Riggs
On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggs wrote: > The only sensible way to handle this is to change the page format as > discussed. IMHO the only sensible way that can happen is if we also > support an online upgrade feature. I will take on the online upgrade > feature if others work on the

Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 07:08:56 PM Tom Lane wrote: > Andres Freund writes: > > On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: > >> What would you want the server to do when a page with a mismatching > >> checksum is read? > > > > Follow the behaviour of zero_damaged_pages.

Re: [HACKERS] Page Checksums

2011-12-20 Thread Tom Lane
Andres Freund writes: > On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: >> What would you want the server to do when a page with a mismatching >> checksum is read? > Follow the behaviour of zero_damaged_pages. Surely not. Nobody runs with zero_damaged_pages turned on in producti

Re: [HACKERS] Page Checksums

2011-12-20 Thread Aidan Van Dyk
On Tue, Dec 20, 2011 at 12:38 PM, Kevin Grittner wrote: >> I don't think the problem is having one page of corruption.  The >> problem is *not knowing* that random pages are corrupted, and >> living in the fear that they might be. > > What would you want the server to do when a page with a mismat

Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: > Alvaro Herrera wrote: > > Excerpts from Christopher Browne's message of mar dic 20 14:12:56 > > > > -0300 2011: > >> It's not evident which problems will be "real" ones. And in such > >> cases, is the answer to turf the database a

Re: [HACKERS] Page Checksums

2011-12-20 Thread Kevin Grittner
Alvaro Herrera wrote: > Excerpts from Christopher Browne's message of mar dic 20 14:12:56 > -0300 2011: > >> It's not evident which problems will be "real" ones. And in such >> cases, is the answer to turf the database and recover from >> backup, because of a single busted page? For a big datab

Re: [HACKERS] Page Checksums

2011-12-20 Thread Kevin Grittner
Robert Haas wrote: > On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner > wrote: >> I was thinking that we would warn when such was found, set hint >> bits as needed, and rewrite with the new CRC. In the unlikely >> event that it was a torn hint-bit-only page update, it would be a >> warning about

Re: [HACKERS] Page Checksums

2011-12-20 Thread Alvaro Herrera
Excerpts from Christopher Browne's message of mar dic 20 14:12:56 -0300 2011: > It's not evident which problems will be "real" ones. And in such > cases, is the answer to turf the database and recover from backup, > because of a single busted page? For a big database, I'm not sure > that's less

Re: [HACKERS] Page Checksums

2011-12-20 Thread Christopher Browne
On Tue, Dec 20, 2011 at 8:36 AM, Robert Haas wrote: > On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner > wrote: >> I was thinking that we would warn when such was found, set hint bits >> as needed, and rewrite with the new CRC.  In the unlikely event that >> it was a torn hint-bit-only page update

Re: [HACKERS] Page Checksums

2011-12-20 Thread Robert Haas
On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner wrote: > I was thinking that we would warn when such was found, set hint bits > as needed, and rewrite with the new CRC.  In the unlikely event that > it was a torn hint-bit-only page update, it would be a warning about > something which is a benign

Re: [HACKERS] Page Checksums

2011-12-19 Thread Kevin Grittner
Greg Smith wrote: > But if you need all that infrastructure just to get the feature > launched, that's a bit hard to stomach. Triggering a vacuum or some hypothetical "scrubbing" feature? > Also, as someone who follows Murphy's Law as my chosen religion, If you don't think I pay attention

  1   2   >