On Tue, Apr 24, 2007 at 04:07:35PM +1000, Neil Brown wrote:
> On Tuesday April 24, [EMAIL PROTECTED] wrote:
> >
> > If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails,
> > then
> > we may have failed the write operation despite prepare_write having
> > instantiated blocks pa
On Tuesday April 24, [EMAIL PROTECTED] wrote:
>
> If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then
> we may have failed the write operation despite prepare_write having
> instantiated blocks past i_size. Fix this, and consolidate the trimming into
> one place.
>
..
>
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/block_dev.c | 26 +++---
1 file changed, 19 insertions(+), 7 deletions(-)
Index: linux-2.6/fs/block_dev.c
===
--- linux-2.6.orig/fs/block
New buffers against uptodate pages are simply be marked uptodate, while the
buffer_new bit remains set. This causes error-case code to zero out parts
of those buffers because it thinks they contain stale data: wrong, they
are actually uptodate so this is a data loss situation.
Fix this by actuall
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:
1. generic_buffered_write
2. lock_page
3.prepare_write
4. unlock_page+vmtruncate
5. copy_from_
Implement new aops for some of the simpler filesystems.
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/configfs/inode.c |4 ++--
fs/hugetlbfs/inode.c | 16 ++--
fs/ramfs/file-mmu.c |4 ++--
fs/ramfs/file-nommu.c |4 ++--
fs/sysfs/inode.c
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Nic
prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and GFS2
were converted to the new aops, so we can make some simplifications for that.
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
Documentation/filesystems/vfs.txt |6 -
fs/ecryptfs/mmap.c
From: Steven Whitehouse <[EMAIL PROTECTED]>
(needs a SOB)
Cc: Linux Filesystems
fs/gfs2/ops_address.c | 209 +-
1 file changed, 125 insertions(+), 84 deletions(-)
Index: linux-2.6/fs/gfs2/ops_address.c
==
[mszeredi]
- don't send zero length write requests
- it is not legal for the filesystem to return with zero written bytes
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
fs/fuse/file.c | 48 +---
1 f
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/smbfs/file.c | 34 +-
1 file changed, 25 insertions(+), 9 deletions(-)
Index: linux-2.6/fs/smbfs/file.c
===
--- linux-2.6.ori
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/hpfs/file.c | 20 ++--
1 file changed, 14 insertions(+), 6 deletions(-)
Index: linux-2.6/fs/hpfs/file.c
===
--- linux-2
Rework the generic block "cont" routines to handle the new aops.
Supporting cont_prepare_write would take quite a lot of code to support,
so remove it instead (and we later convert all filesystems to use it).
write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
generic_cont_expand, so fil
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/ufs/dir.c | 50 +++---
fs/ufs/inode.c | 23 +++
2 files changed, 50 insertions(+), 23 deletions(-)
Index: linux-2.6/fs/ufs/inode.c
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/jffs2/file.c | 105 +++-
1 file changed, 66 insertions(+), 39 deletions(-)
Index: linux-2.6/fs/jffs2/file.c
==
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/sysv/dir.c | 45 +
fs/sysv/itree.c | 23 +++
2 files changed, 44 insertions(+), 24 deletions(-)
Index: linux-2.6/fs/sysv/itree.c
==
Convert udf to new aops. Also seem to have fixed pagecache corruption in
udf_adinicb_commit_write -- page was marked uptodate when it is not. Also,
fixed the silly setup where prepare_write was doing a kmap to be used in
commit_write: just do kmap_atomic in write_end. Use libfs helpers to make
this
Cc: Linux Filesystems
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/ecryptfs/crypto.c | 32 +++---
fs/ecryptfs/ecryptfs_kernel.h |4
fs/ecryptfs/mmap.c| 213 +++-
From: Mark Fasheh <[EMAIL PROTECTED]>
Fix up ocfs2 to use ->write_begin and ->write_end. This lets us dump a large
amount of code which was implementing our own write path while preserving
the nice locking rules that were gained by moving away from ->prepare_write.
It makes use of the context bac
Cc: Andries Brouwer <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/minix/dir.c | 43 +--
fs/minix/inode.c | 23 +++
2 files changed, 44 insertions(+), 22 deletions(-)
Index: linux-2.6/
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/affs/file.c | 106 +++--
1 file changed, 58 insertions(+), 48 deletions(-)
Index: linux-2.6/fs/affs/file.c
=
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/hfs/extent.c | 19 ---
fs/hfs/inode.c | 20
2 files changed, 20 insertions(+), 19 deletions(-)
Index: linux-2.6/fs/hfs/inode.c
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/nfs/file.c | 49 -
1 file changed, 36 insertions(+), 13 deletions(-)
Index: linux-2.6/fs/nfs/file.c
===
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/adfs/inode.c | 14 +-
1 file changed, 9 insertions(+), 5 deletions(-)
Index: linux-2.6/fs/adfs/inode.c
===
--- linux-2.6.or
This also gets rid of a lot of useless read_file stuff. And also
optimises the full page write case by marking a !uptodate page uptodate.
Cc: Jeff Dike <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/hostfs/hostfs_kern.c | 70 +++
Convert to new aops, and fix security hole where page is set uptodate
before contents are uptodate.
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/cifs/file.c | 89 -
1
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/hfsplus/extents.c | 21 +
fs/hfsplus/inode.c | 20
2 files changed, 21 insertions(+), 20 deletions(-)
Index: linux-2.6/fs/hfsplus/inode.c
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/qnx4/inode.c | 21 +
1 file changed, 13 insertions(+), 8 deletions(-)
Index: linux-2.6/fs/qnx4/inode.c
===
--- linu
Restore the KERNEL_DS optimisation, especially helpful to the 2copy write
path.
This may be a pretty questionable gain in most cases, especially after the
legacy 2copy write path is removed, but it doesn't cost much.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-of
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/fat/inode.c | 27 ---
1 file changed, 16 insertions(+), 11 deletions(-)
Index: linux-2.6/fs/fat/inode.c
===
---
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/jfs/inode.c | 19 +++
1 file changed, 11 insertions(+), 8 deletions(-)
Index: linux-2.6/fs/jfs/inode.c
==
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/bfs/file.c | 12
1 file changed, 8 insertions(+), 4 deletions(-)
Index: linux-2.6/fs/bfs/file.c
===
--- linux-2.6.orig/fs/
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/ext2/dir.c | 47 +--
fs/ext2/ext2.h |3 +++
fs/ext2/inode.c | 24 +---
3 files changed, 45 insertions(+), 29 deletions(-)
Inde
Also clean up various little things.
I've got rid of the comment from akpm, because now that make_page_uptodate
is only called from 2 places, it is pretty easy to see that the buffers
are in an uptodate state at the time of the call. Actually, it was OK before
my patch as well, because the memset
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
fs/xfs/linux-2.6/xfs_aops.c | 19 ---
fs/xfs/linux-2.6/xfs_lrw.c | 35 ---
2 files changed, 24 insertions(+), 30 deletions(-)
Index: linux-2.6/fs/xfs/l
Add an iterator data structure to operate over an iovec. Add usercopy
operators needed by generic_file_buffered_write, and convert that function
over.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
include/linux/fs.h | 33
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
Various fixes and improvements
Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
fs/ext3/inode.c | 136
1 file changed, 88 insertions(+), 48 d
Cc: [EMAIL PROTECTED]
Cc: Linux Filesystems
Convert ext4 to use write_begin()/write_end() methods.
Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
fs/ext4/inode.c | 147 +++-
1 file changed, 93 insertions(+), 54 deletions(-)
Index: linux
Hide some of the open-coded nr_segs tests into the iovec helpers. This is
all to simplify generic_file_buffered_write, because that gets more complex
in the next patch.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
mm/filem
Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
From: Andrew Morton <[EMAIL PROTECTED]>
Rename some variables and fix some types.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
mm/filemap.c | 35 ++
If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then
we may have failed the write operation despite prepare_write having
instantiated blocks past i_size. Fix this, and consolidate the trimming into
one place.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux File
From: Andrew Morton <[EMAIL PROTECTED]>
This patch fixed the following bug:
When prefaulting in the pages in generic_file_buffered_write(), we only
faulted in the pages for the firts segment of the iovec. If the second of
successive segment described a mmapping of the page into which we're
Revert the patch from Neil Brown to optimise NFSD writev handling.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Cc: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
mm/filemap.c | 32 +---
1 file changed, 13 in
Hi, these patches are against 2.6.21-rc6-mm1. Aside from OCFS2, there
were no major clashes between -mm and mainline diffs, which is nice.
These patches aim to solve the long standing buffered write deadlocks,
and then go on to introduce a pair of new write a_op methods which
allow the deadlock to
Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the
difficult race where the page may be unmapped before calling copy_from_user.
Makes the race much easier to hit.
This is useful for demonstration and testing purposes, but is removed in a
subsequent patch.
Cc: Linux Memory
From: Andrew Morton <[EMAIL PROTECTED]>
This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
also revert.
Cc: Linux Memory Management <[EMAIL PROTECTED]>
Cc: Linux Filesystems
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
On Mon, 23 Apr 2007, Amit Gud wrote:
On Mon, 23 Apr 2007, Arjan van de Ven wrote:
> The other thing which we should consider is that chunkfs really
> requires a 64-bit inode number space, which means either we only allow
does it?
I'd think it needs a "chunk space" number and a 32 bit loc
On Mon, 23 Apr 2007, Arjan van de Ven wrote:
The other thing which we should consider is that chunkfs really
requires a 64-bit inode number space, which means either we only allow
does it?
I'd think it needs a "chunk space" number and a 32 bit local inode
number ;) (same for blocks)
For i
> The other thing which we should consider is that chunkfs really
> requires a 64-bit inode number space, which means either we only allow
does it?
I'd think it needs a "chunk space" number and a 32 bit local inode
number ;) (same for blocks)
-
To unsubscribe from this list: send the line "unsub
On Mon, Apr 23, 2007 at 02:17:55PM +0200, Miklos Szeredi wrote:
> Nick,
>
> Thanks for converting fuse, and testing. Here's a minor update to
> fs-fuse-aops.patch.
>
> Miklos
>
>
> Convert fuse to new aops.
>
> [mszeredi]
> - don't send zero length write requests
> - it is not legal for
On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote:
> > With a blocksize of 4KB, a block group would be 128 MB. In the original
> > Chunkfs paper, Valh had mentioned 1GB chunks and I believe it will be
> > possible to use 2GB, 4GB or 8GB chunks in the future. As the chunk size
> > incre
Avishay Traeger wrote:
On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
For some time I had been working on this file system test framework.
Now I have a implementation for the same and below is the explanation.
Any comments are welcome.
You may want to check out the paper "EXPLODE:
Suparna Bhattacharya wrote:
Could you send this out as a patch to ext2 codebase, so we can just look
at the changes for chunkfs ? That might also make it small enough
to inline your patch in email for review.
What kind of results are you planning to gather to evaluate/optimize this ?
Mainly
On Apr 23, 2007 15:04 +0530, Kalpak Shah wrote:
> On Mon, 2007-04-23 at 12:49 +0530, Karuna sagar K wrote:
> > The tool estimates the cross-chunk references from an extt2/3 file
> > system. It considers a block group as one chunk and calcuates how many
> > block groups does a file span across. So,
On Mon, Apr 23, 2007 at 09:58:49PM +0530, Suparna Bhattacharya wrote:
> On Mon, Apr 23, 2007 at 06:21:34AM -0500, Amit Gud wrote:
> >
> > This is an initial implementation of ChunkFS technique, briefly discussed
> > at: http://lwn.net/Articles/190222 and
> > http://cis.ksu.edu/~gud/docs/chunkfs-h
On Mon, Apr 23, 2007 at 06:21:34AM -0500, Amit Gud wrote:
>
> This is an initial implementation of ChunkFS technique, briefly discussed
> at: http://lwn.net/Articles/190222 and
> http://cis.ksu.edu/~gud/docs/chunkfs-hotdep-val-arjan-gud-zach.pdf
>
> This implementation is done within ext2 driver
On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
> For some time I had been working on this file system test framework.
> Now I have a implementation for the same and below is the explanation.
> Any comments are welcome.
You may want to check out the paper "EXPLODE: A Lightweight, Genera
Nick,
Thanks for converting fuse, and testing. Here's a minor update to
fs-fuse-aops.patch.
Miklos
Convert fuse to new aops.
[mszeredi]
- don't send zero length write requests
- it is not legal for the filesystem to return with zero written bytes
Signed-off-by: Nick Piggin <[EMAIL PROT
This is an initial implementation of ChunkFS technique, briefly discussed
at: http://lwn.net/Articles/190222 and
http://cis.ksu.edu/~gud/docs/chunkfs-hotdep-val-arjan-gud-zach.pdf
This implementation is done within ext2 driver. Every chunk is an
independent ext2 file system. The knowledge abo
On 4/23/07, Kalpak Shah <[EMAIL PROTECTED]> wrote:
On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
.
The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the de
On Mon, 2007-04-23 at 12:49 +0530, Karuna sagar K wrote:
> Hi,
>
> The tool estimates the cross-chunk references from an extt2/3 file
> system. It considers a block group as one chunk and calcuates how many
> block groups does a file span across. So, the block group size gives
> the estimate of ch
On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
> Hi,
>
> For some time I had been working on this file system test framework.
> Now I have a implementation for the same and below is the explanation.
> Any comments are welcome.
>
> Introduction:
> The testing tools and benchmarks availab
Hi,
The tool estimates the cross-chunk references from an extt2/3 file
system. It considers a block group as one chunk and calcuates how many
block groups does a file span across. So, the block group size gives
the estimate of chunk size.
The file systems were aged for about 3-4 months on a deve
64 matches
Mail list logo