Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Jamie Lokier wrote: Jeff Garzik wrote: Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barr

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 17:29:13 +, Jamie Lokier wrote: > > You're right. Though, doesn't normal page writeback enqueue the COW > metadata changes? If not, how do they get written in a timely > fashion? It does. But this is not sufficient to guarantee that the pages in question have been

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: > > > > > One interesting aspect of this comes with COW filesystems like btrfs or > > > logfs. Writing out data pages is not sufficient, because those will get > > > lost unless their referencing metadata is written

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: > > > One interesting aspect of this comes with COW filesystems like btrfs or > > logfs. Writing out data pages is not sufficient, because those will get > > lost unless their referencing metadata is written as well. So either we > > h

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: > Nick Piggin wrote: > >Anyway, the idea of making fsync/fdatasync etc. safe by default is > >a good idea IMO, and is a bad bug that we don't do that :( > > Agreed... it's also disappointing that [unless I'm mistaken] you have > to hack each filesystem to support barriers. >

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It seems far easier to make syn

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Andrew Morton
On Tue, 26 Feb 2008 15:07:45 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > SYNC_FILE_RANGE_WRITE scans all pages in the range, looking for dirty > pages which aren't already queued for write-out. It marks those with > a "write-out" flag, and starts write I/Os at some unspecified time in > the n

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Ric Wheeler wrote: > >>I was surprised that fsync() doesn't do this already. There was a lot > >>of effort put into block I/O write barriers during 2.5, so that > >>journalling filesystems can force correct write ordering, using disk > >>flush cache commands. > >> > >>After all that effort, I was

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > Yeah, sync_file_range has slightly unusual semantics and introduce > > the new concept, "writeout", to userspace (does "writeout" include > > "in drive cache"? the kernel doesn't think so, but the only way to > > ma

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that journalling

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > > > Yeah, sync_file_range has slightly unusual semantics and introduce > > the new concept, "writeout", to userspace (does "writeout" include > > "in drive cache"? the kernel doesn't think so, but the only way to >

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > Yeah, sync_file_range has slightly unusual semantics and introduce > the new concept, "writeout", to userspace (does "writeout" include > "in drive cache"? the kernel doesn't think so, but the only way to > make sync_file_range "safe"

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: > [snip huge long proposal] > > Rather than invent new APIs, we should fix the existing ones to _really_ > flush data to physical media. Btw, one reason for the length is the current block request API isn't sufficient even to make fsync() durable with _no_ new APIs. It offers

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Nick Piggin
On Tuesday 26 February 2008 18:59, Jamie Lokier wrote: > Andrew Morton wrote: > > On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > > > (It would be nicer if sync_file_range() > > > took a vector of ranges for better elevator scheduling, but let's > > > ignore that :-) >

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Andrew Morton wrote: > On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > > > (It would be nicer if sync_file_range() > > took a vector of ranges for better elevator scheduling, but let's > > ignore that :-) > > Two passes: > > Pass 1: shove each of the segments into th

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-25 Thread Jamie Lokier
Jeff Garzik wrote: > Jamie Lokier wrote: > >By durable, I mean that fsync() should actually commit writes to > >physical stable storage, > > Yes, it should. Glad we agree :-) > >I was surprised that fsync() doesn't do this already. There was a lot > >of effort put into block I/O write barriers

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-25 Thread Jeff Garzik
Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that journalling filesystems can for

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-25 Thread Andrew Morton
On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > (It would be nicer if sync_file_range() > took a vector of ranges for better elevator scheduling, but let's > ignore that :-) Two passes: Pass 1: shove each of the segments into the queue with SYNC_FILE_RANGE_WA

Proposal for "proper" durable fsync() and fdatasync()

2008-02-25 Thread Jamie Lokier
Dear kernel, This is a proposal to add "proper" durable fsync() and fdatasync() to Linux. First the problem, then a proposed solution "with benefits", so to speak. I need feedback on the details, before implementing anything. Or (hopefully) someone else thinks it's very important and does it th