On 9/19/07, David Chinner [EMAIL PROTECTED] wrote:
The problem is this: to alter the fundamental block size of the
filesystem we also need to alter the data block size and that is
exactly the piece that linux does not support right now. So while
we have the capability to use large block sizes
Christoph Hellwig wrote:
This is not based on my attempt to make the xfs writeout path generic.
Alex's variant is a lot simpler and thus missed various bits required
for high sustained writeout performance or xfs functionality.
I'd very appreciate any details about high writeout performance.
David Chinner wrote:
Using a new API for new functionality is a bad thing?
if existing API can be used ...
No, it doesn't provide the same functionality.
Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
Jeff Garzik wrote:
Alex Tomas wrote:
So without the ability to attach specific I/O completions to bios
or support for unwritten extents directly in __mpage_writepage,
there is no way XFS can use this generic delayed allocation code.
I didn't say generic, see Subject: :)
Well, it shouldn't
David Chinner wrote:
Firstly, XFS attaches a different I/O completion to delalloc writes
to allow us to update the file size when the write is beyond the
current on disk EOF. This code cannot do that as all it does is
allocation and present normal looking buffers to the generic code
path.
how
(), allocate blocks and correct on-disk
size
Current implementation works with data=writeback only, you should
mount filesystem with delalloc,data=writeback options.
TODO:
* reservation
* data=ordered
* quota
* bmap
Signed-off-by: Alex Tomas [EMAIL PROTECTED]
Index: linux-2.6.22/include/linux
defer allocation. mpage_da_writepages() finds all
non-allocated blocks and try to allocate them with minimal calls
to -get_block(), then submit IO using __mpage_writepage()
Signed-off-by: Alex Tomas [EMAIL PROTECTED]
Index: linux-2.6.22-rc4/fs/buffer.c
It duplicates fs/mpage.c in bio building and introduces new generic API
(iomap, map_blocks_t, etc). In contrast, my trivial implementation re-use
existing code in fs/mpage.c, doesn't introduce new API and I tend to think
provides quite the same functionality. I can be wrong, of course ...
Aneesh Kumar K.V wrote:
+/* first, we need to know whether the block is allocated already
+ * XXX: when the filesystem has a lot of free blocks, we could
+ * reserve even allocated blocks to save this lookup */
+ret = ext4_get_blocks_wrap(NULL, inode, iblock, 1, bh_result, 0,
Badari Pulavarty (BP) writes:
BP In order to do the correct accounting, we need to mark a page
BP to indicate if we reserved a block or not. One way to do this,
BP to use page-private to indicate this. But then, all the generic
BP routines will fail - since they assume that page-private
Badari Pulavarty (BP) writes:
you can introduce one more bit to page-flags
BP Agreed. I was hoping to avoid it as much as I can.
well, you're gonna modify mpage api anyway ...
BP What I meant by jounalling mode is that - after the pages are submitted
BP for IO, we need some way of
Badari Pulavarty (BP) writes:
2) Andrew proposed the excelent solution
BP Well, I wasn't sure how heavy thats going to be. He was recommending
BP that we flush all dirty pages from all inodes for each transaction
BP commit. Isn't it ?
this is exactly what ext3 does being mounted with
Nikita Danilov (ND) writes:
In order to do the correct accounting, we need to mark a page
to indicate if we reserved a block or not. One way to do this,
to use page-private to indicate this. But then, all the generic
I believe one can use PG_mappedtodisk bit in page-flags for this
Werner Almesberger (WA) writes:
WA Do you plan to reserve space as blocks, somewhere, or as these
WA specific on-disk locations ? In ABISS, we did something of the
WA latter kind (in order to make large contiguous allocations also on
WA FAT), and it turned out to be a big mess, because ABISS
Werner Almesberger (WA) writes:
locked during writeback? PG_writeback should be used instead of PG_locked.
WA In mpage_writepages, writepage can also get called with the page just
WA PG_locked.
you can drop PG_locked right as you set PG_writeback, I think
thanks, Alex
-
To unsubscribe
Good day all,
could you recommend good SATA cards for benchmarking on 2.6?
thanks, Alex
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 4 Mar 2005 16:43:31 +0530
Suparna Bhattacharya [EMAIL PROTECTED] wrote:
Alex, have you had a chance to prototype your idea of rooting extents
in ea ?
I think all you need for this are:
1) allocate EA in ext3_new_inode()
2) write a replacement for ext3_init_tree_desc()
just few
On 03 Mar 2005 17:12:14 -0800
Badari Pulavarty [EMAIL PROTECTED] wrote:
One more thing, we need to keep in mind is - we need to make sure
that ordered mode also improved - since all our testcode
focuses on writeback mode and the default mode is ordered :(
I've just cooked the patch to
Jan Blunck (JB) writes:
JB i_sem does NOT protect the dcache. Also not in real_lookup(). The lock
must be
JB acquired for -lookup() and because we might sleep on i_sem, we have to
get it
JB early and check for repopulation of the dcache.
dentry is part of dcache, right? i_sem protects
Jan Blunck (JB) writes:
JB With luck you have s_pdirops_size (or 1024) different renames altering
JB concurrently one directory inode. Therefore you need a lock protecting
JB your filesystem data. This is basically the job done by i_sem. So in
JB my opinion you only move The Problem from the
Index: linux-2.6.10/mm/shmem.c
===
--- linux-2.6.10.orig/mm/shmem.c2005-01-28 19:32:16.0 +0300
+++ linux-2.6.10/mm/shmem.c 2005-02-19 20:05:32.642599576 +0300
@@ -1849,7 +1849,7 @@
#endif
};
-static int
Badari Pulavarty (BP) writes:
BP Sure. I think it will improve the allocation case.
BP Non-allocation case, should be pretty much same, provided
BP I got contiguous layout on the disk. Isn't it ?
not allocation only:
[EMAIL PROTECTED] root]# time /work/tests/fwrite /test/fff 64 1
real
Badari Pulavarty (BP) writes:
BP Test: writes 10,000 blocks of 64k and does fdatasync().
The patch doesn't apply ;) after a minor correction
I've tested the patch too:
SMP, before:
[EMAIL PROTECTED] root]# time /work/tests/fwrite /test/fff 64 1
real0m17.748s user0m0.032s sys
Badari Pulavarty (BP) writes:
BP Thank you for testing and confirming the results.
BP Is this on a simple single disk configuration ?
nope, the box boxes connect to 2disks raid0 via FC1.
but the disks are damn old ...
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel
24 matches
Mail list logo