Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread Daniel Phillips
On Tuesday 26 February 2008 06:33, David Howells wrote: > > Suppose one were to take a mundane approach to the persistent cache > > problem instead of layering filesystems. What you would do then is > > change NFS's ->write_page and variants to fiddle the persistent > > cache > > It is a requirem

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Jamie Lokier wrote: Jeff Garzik wrote: Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barr

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 17:29:13 +, Jamie Lokier wrote: > > You're right. Though, doesn't normal page writeback enqueue the COW > metadata changes? If not, how do they get written in a timely > fashion? It does. But this is not sufficient to guarantee that the pages in question have been

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: > > > > > One interesting aspect of this comes with COW filesystems like btrfs or > > > logfs. Writing out data pages is not sufficient, because those will get > > > lost unless their referencing metadata is written

Re: [RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Andreas Dilger
On Feb 26, 2008 08:39 -0800, Eric Sandeen wrote: > Takashi Sato wrote: > > > o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS > > As Andreas Dilger and Christoph Hellwig advised me, I have elevated > > them to include/linux/fs.h as below. > > #define FIFREEZE

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: > > > One interesting aspect of this comes with COW filesystems like btrfs or > > logfs. Writing out data pages is not sufficient, because those will get > > lost unless their referencing metadata is written as well. So either we > > h

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: > Nick Piggin wrote: > >Anyway, the idea of making fsync/fdatasync etc. safe by default is > >a good idea IMO, and is a bad bug that we don't do that :( > > Agreed... it's also disappointing that [unless I'm mistaken] you have > to hack each filesystem to support barriers. >

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It seems far easier to make syn

Re: [RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Eric Sandeen
Takashi Sato wrote: > o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS > As Andreas Dilger and Christoph Hellwig advised me, I have elevated > them to include/linux/fs.h as below. > #define FIFREEZE_IOWR('X', 119, int) >   #define FITHAW _IOWR('X',

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Andrew Morton
On Tue, 26 Feb 2008 15:07:45 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > SYNC_FILE_RANGE_WRITE scans all pages in the range, looking for dirty > pages which aren't already queued for write-out. It marks those with > a "write-out" flag, and starts write I/Os at some unspecified time in > the n

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Ric Wheeler wrote: > >>I was surprised that fsync() doesn't do this already. There was a lot > >>of effort put into block I/O write barriers during 2.5, so that > >>journalling filesystems can force correct write ordering, using disk > >>flush cache commands. > >> > >>After all that effort, I was

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > Yeah, sync_file_range has slightly unusual semantics and introduce > > the new concept, "writeout", to userspace (does "writeout" include > > "in drive cache"? the kernel doesn't think so, but the only way to > > ma

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that journalling

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: > On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > > > Yeah, sync_file_range has slightly unusual semantics and introduce > > the new concept, "writeout", to userspace (does "writeout" include > > "in drive cache"? the kernel doesn't think so, but the only way to >

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread David Howells
Daniel Phillips <[EMAIL PROTECTED]> wrote: > I need to respond to this in pieces... first the bit that is bugging > me: > > > > * two new page flags > > > > I need to keep track of two bits of per-cached-page information: > > > > (1) This page is known by the cache, and that the cache must b

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: > > Yeah, sync_file_range has slightly unusual semantics and introduce > the new concept, "writeout", to userspace (does "writeout" include > "in drive cache"? the kernel doesn't think so, but the only way to > make sync_file_range "safe"

Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Peter Zijlstra
On Tue, 2008-02-26 at 13:45 +0100, Miklos Szeredi wrote: > Starting review in the middle, because this is the part I'm most > familiar with. > > > New addres_space_operations methods are added: > > int swapfile(struct address_space *, int); > > Separate ->swapon() and ->swapoff() methods would

Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Miklos Szeredi
Starting review in the middle, because this is the part I'm most familiar with. > New addres_space_operations methods are added: > int swapfile(struct address_space *, int); Separate ->swapon() and ->swapoff() methods would be so much cleaner IMO. Also is there a reason why 'struct file *' can

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread Daniel Phillips
I need to respond to this in pieces... first the bit that is bugging me: > > * two new page flags > > I need to keep track of two bits of per-cached-page information: > > (1) This page is known by the cache, and that the cache must be informed if > the page is going to go away. I still

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: > [snip huge long proposal] > > Rather than invent new APIs, we should fix the existing ones to _really_ > flush data to physical media. Btw, one reason for the length is the current block request API isn't sufficient even to make fsync() durable with _no_ new APIs. It offers

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Nick Piggin
On Tuesday 26 February 2008 18:59, Jamie Lokier wrote: > Andrew Morton wrote: > > On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > > > (It would be nicer if sync_file_range() > > > took a vector of ranges for better elevator scheduling, but let's > > > ignore that :-) >

[RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Takashi Sato
Hi, Takashi Sato wrote: >>> Instead, I'd like the sec to timeout on freeze API in order to thaw >>> the filesystem automatically. It can prevent a filesystem from staying >>> frozen forever. >>> (Because a freezer may cause a deadlock by accessing the frozen filesystem.) >> >>I'm still not very c

Re: Proposal for "proper" durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Andrew Morton wrote: > On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier <[EMAIL PROTECTED]> wrote: > > > (It would be nicer if sync_file_range() > > took a vector of ranges for better elevator scheduling, but let's > > ignore that :-) > > Two passes: > > Pass 1: shove each of the segments into th