Re: [HACKERS] fsync reliability

2011-05-09 Thread Bruce Momjian
FYI, does wal.c need updated comments to explain the file system semantics we expect, and how our code triggers it? --- Greg Smith wrote: On 04/23/2011 09:58 AM, Matthew Woodcraft wrote: As far as I can make out, the

Re: [HACKERS] fsync reliability

2011-04-25 Thread Greg Smith
On 04/24/2011 10:06 PM, Daniel Farina wrote: On Thu, Apr 21, 2011 at 8:51 PM, Greg Smithg...@2ndquadrant.com wrote: There's still the fsync'd a data block but not the directory entry yet issue as fall-out from this too. Why doesn't PostgreSQL run into this problem? Because the exact code

Re: [HACKERS] fsync reliability

2011-04-25 Thread Greg Smith
On 04/23/2011 09:58 AM, Matthew Woodcraft wrote: As far as I can make out, the current situation is that this fix (the auto_da_alloc mount option) doesn't work as advertised, and the ext4 maintainers are not treating this as a bug. See https://bugzilla.kernel.org/show_bug.cgi?id=15910 I

Re: [HACKERS] fsync reliability

2011-04-25 Thread Greg Stark
On Mon, Apr 25, 2011 at 5:00 PM, Greg Smith g...@2ndquadrant.com wrote: Stop right there; the slow path was the only one that had any hope of being correct.  It can actually slow things by a factor of 100X or more, worst-case.  So, we currently have the choice between filesystem corruption or

Re: [HACKERS] fsync reliability

2011-04-25 Thread Daniel Farina
On Mon, Apr 25, 2011 at 8:26 AM, Greg Smith g...@2ndquadrant.com wrote: On 04/24/2011 10:06 PM, Daniel Farina wrote: On Thu, Apr 21, 2011 at 8:51 PM, Greg Smithg...@2ndquadrant.com  wrote: There's still the fsync'd a data block but not the directory entry yet issue as fall-out from this

Re: [HACKERS] fsync reliability

2011-04-24 Thread Daniel Farina
On Thu, Apr 21, 2011 at 1:26 AM, Simon Riggs si...@2ndquadrant.com wrote: Daniel Farina points out to me that the Linux man page for fsync() says Calling fsync() does not necessarily ensure that the entry in the directory       containing the file has also reached disk.  For that an explicit

Re: [HACKERS] fsync reliability

2011-04-24 Thread Daniel Farina
On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith g...@2ndquadrant.com wrote: There's still the fsync'd a data block but not the directory entry yet issue as fall-out from this too.  Why doesn't PostgreSQL run into this problem?  Because the exact code sequence used is this one: open write fsync

Re: [HACKERS] fsync reliability

2011-04-23 Thread Matthew Woodcraft
On 2011-04-22 21:55, Greg Smith wrote: On 04/22/2011 09:32 AM, Simon Riggs wrote: OK, that's good, but ISTM we still have a hole during RemoveOldXlogFiles() where we don't fsync or open/close the file, just rename it. This is also something that many applications rely upon working as hoped

Re: [HACKERS] fsync reliability

2011-04-22 Thread Simon Riggs
On Fri, Apr 22, 2011 at 4:51 AM, Greg Smith g...@2ndquadrant.com wrote: On 04/21/2011 04:26 AM, Simon Riggs wrote: However, that begs the question of what happens with WAL. At present, we do nothing to ensure that the entry in the directory containing the file has also reached disk. Well,

Re: [HACKERS] fsync reliability

2011-04-22 Thread Greg Smith
Simon Riggs wrote: We do issue fsync and then close, but only when we switch log files. We don't do that as part of the normal commit path. Since all these files are zero-filled before use, the space is allocated for them, and the remaining important filesystem layout metadata gets

Re: [HACKERS] fsync reliability

2011-04-22 Thread Simon Riggs
On Fri, Apr 22, 2011 at 1:35 PM, Greg Smith g...@2ndquadrant.com wrote: Simon Riggs wrote: We do issue fsync and then close, but only when we switch log files. We don't do that as part of the normal commit path. Since all these files are zero-filled before use, the space is allocated for

Re: [HACKERS] fsync reliability

2011-04-22 Thread Greg Stark
On Thu, Apr 21, 2011 at 4:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: The traditional standard is that the filesystem is supposed to take care of its own metadata, and even Linux filesystems have pretty much figured that out.  I don't really see a need for us to be nursemaiding the filesystem.  

Re: [HACKERS] fsync reliability

2011-04-22 Thread Greg Smith
On 04/22/2011 09:32 AM, Simon Riggs wrote: OK, that's good, but ISTM we still have a hole during RemoveOldXlogFiles() where we don't fsync or open/close the file, just rename it. This is also something that many applications rely upon working as hoped for here, even though it's not

[HACKERS] fsync reliability

2011-04-21 Thread Simon Riggs
Daniel Farina points out to me that the Linux man page for fsync() says Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

Re: [HACKERS] fsync reliability

2011-04-21 Thread Alvaro Herrera
Excerpts from Simon Riggs's message of jue abr 21 05:26:06 -0300 2011: ISTM that we can easily do this, since we preallocate WAL files during RemoveOldXlogFiles() and rarely extend the number of files. So it seems easily possible to fsync the pg_xlog directory at the end of

Re: [HACKERS] fsync reliability

2011-04-21 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: Daniel Farina points out to me that the Linux man page for fsync() says Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file

Re: [HACKERS] fsync reliability

2011-04-21 Thread Robert Haas
On Thu, Apr 21, 2011 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: The traditional standard is that the filesystem is supposed to take care of its own metadata, and even Linux filesystems have pretty much figured that out.  I don't really see a need for us to be nursemaiding the filesystem.  

Re: [HACKERS] fsync reliability

2011-04-21 Thread Simon Riggs
On Thu, Apr 21, 2011 at 4:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: The traditional standard is that the filesystem is supposed to take care of its own metadata, and even Linux filesystems have pretty much figured that out.  I don't really see a need for us to be nursemaiding the filesystem.  

Re: [HACKERS] fsync reliability

2011-04-21 Thread Simon Riggs
On Thu, Apr 21, 2011 at 5:45 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Apr 21, 2011 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: The traditional standard is that the filesystem is supposed to take care of its own metadata, and even Linux filesystems have pretty much figured that

Re: [HACKERS] fsync reliability

2011-04-21 Thread Robert Haas
On Thu, Apr 21, 2011 at 12:53 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Apr 21, 2011 at 5:45 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Apr 21, 2011 at 11:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: The traditional standard is that the filesystem is supposed to take care of

Re: [HACKERS] fsync reliability

2011-04-21 Thread Greg Smith
On 04/21/2011 04:26 AM, Simon Riggs wrote: However, that begs the question of what happens with WAL. At present, we do nothing to ensure that the entry in the directory containing the file has also reached disk. Well, we do, but it's not obvious why that is unless you've stared at this