Re: [RFC] Ext3 online defrag

2006-11-15 Thread Takashi Sato
Hi Alex, Thank you for your information. I have sent the patches of the defragmentation for a extent-based file on ext3 using your patches of the multi-block allocation. I'm happy if you have a time to review my patches. [RFC][PATCH 0/3] Extent base online defrag

Re: [RFC] Ext3 online defrag

2006-10-27 Thread Eric Sandeen
Alex Tomas wrote: 3) scalable reservation required for delayed allocation to avoid -ENOSPC at flush time. current version uses per-sb spinlock. Can you elaborate on this issue? Shouldn't delayed allocation decrement free space immediately, and only the actual block location choice is

Re: [RFC] Ext3 online defrag

2006-10-27 Thread Alex Tomas
Eric Sandeen (ES) writes: ES Alex Tomas wrote: 3) scalable reservation required for delayed allocation to avoid -ENOSPC at flush time. current version uses per-sb spinlock. ES Can you elaborate on this issue? Shouldn't delayed allocation ES decrement free space immediately, and only

Re: [RFC] Ext3 online defrag

2006-10-26 Thread David Chinner
On Wed, Oct 25, 2006 at 11:33:16PM -0400, Theodore Tso wrote: On Thu, Oct 26, 2006 at 11:40:20AM +1000, David Chinner wrote: We don't need to expose anything filesystem specific to userspace to implement this. Online data movement (i.e. the defrag mechanism) becomes something like:

Re: [RFC] Ext3 online defrag

2006-10-26 Thread Andreas Dilger
On Oct 25, 2006 16:54 +0200, Jan Kara wrote: I've just not yet decided how to handle indirect blocks in case of relocation in the middle of the file. Should they be relocated or shouldn't they? Probably they should be relocated at least in case they are fully contained in relocated interval

Re: [RFC] Ext3 online defrag

2006-10-26 Thread Jan Kara
On Wed, Oct 25, 2006 at 01:00:52PM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 06:11:37PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 02:01:42AM -0400, Jeff Garzik wrote: So how do you then get the generic interface to allocate blocks specified by userspace race free?

Re: [RFC] Ext3 online defrag

2006-10-26 Thread Theodore Tso
On Thu, Oct 26, 2006 at 04:36:48PM +1000, David Chinner wrote: Remember, I'm not just talking about defrag - I'm talking about an interface that is actually useful to apps that might care about how data is laid out on disk but the applications writers don't know anyhting about how

Re: [RFC] Ext3 online defrag

2006-10-26 Thread Dave Kleikamp
On Thu, 2006-10-26 at 09:37 -0400, Theodore Tso wrote: On Thu, Oct 26, 2006 at 04:36:48PM +1000, David Chinner wrote: Remember, I'm not just talking about defrag - I'm talking about an interface that is actually useful to apps that might care about how data is laid out on disk but the

Re: [RFC] Ext3 online defrag

2006-10-26 Thread Jörn Engel
On Wed, 25 October 2006 14:41:18 -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 08:36:56PM +0200, Jan Kara wrote: Yes, but there's a question of the interface to this operation. How to specify which indirect block I mean? Obviously we could introduce separate call for remapping

Re: [RFC] Ext3 online defrag

2006-10-26 Thread David Chinner
On Thu, Oct 26, 2006 at 01:37:22PM +0200, Jan Kara wrote: On Wed, Oct 25, 2006 at 01:00:52PM -0400, Jeff Garzik wrote: We don't need to expose anything filesystem specific to userspace to implement this. Online data movement (i.e. the defrag mechanism) becomes something like: do

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 02:27:53PM +1000, David Chinner wrote: But it a race that is _easily_ handled, and applications only need to implement one interface, not a

Re: [RFC] Ext3 online defrag

2006-10-25 Thread David Chinner
On Wed, Oct 25, 2006 at 02:01:42AM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote: So why are you arguing that an interface is no good because it is fundamentally racy? ;) My point was

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jan Kara
On Oct 24, 2006 15:44 -0400, Theodore Tso wrote: First of all, we would need a way of allowing userpsace to specify which blocks should be used in the preallocation. Presumably it could do this in the same way it will be specifying which blocks to relocate in the defragger - by passing

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 06:11:37PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 02:01:42AM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote: So why are you arguing that an interface

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 04:54:50PM +0200, Jan Kara wrote: Yes, this sounds feasible. We could split the defrag ioctl into two pieces (addition of given extent to a file and swapping of extents), which can have generic interface... An ioctl is UGLY. This was discussed years ago. Google for

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jan Kara
On Wed, Oct 25, 2006 at 04:54:50PM +0200, Jan Kara wrote: Yes, this sounds feasible. We could split the defrag ioctl into two pieces (addition of given extent to a file and swapping of extents), which can have generic interface... An ioctl is UGLY. Agreed. This was discussed years

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jan Kara
On Wed, Oct 25, 2006 at 07:58:51PM +0200, Jan Kara wrote: I've briefly looked at this and this kind of interface has some appeal. On the other hand it's not obvious to me, how to implement in this interface *atomic* operation copy data from file F to given set of blocks and rewrite

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 08:25:30PM +0200, Jan Kara wrote: I see. So you mean that in our ext3meta filesystem we'd have a file named add_this_extent_to_inode and a file reloc_inode_interval and they'd be fed essentially the same info as the current ioctl interface and do the same thing as we

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jan Kara
On Oct 23, 2006 18:03 +0200, Jan Kara wrote: Andreas Dilger wrote: I would in fact go so far as to allow only a single extent to be specified per call. This is to avoid the passing of any pointers as part of the interface (hello ioctl police :-), and also makes the kernel code

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 08:36:56PM +0200, Jan Kara wrote: Yes, but there's a question of the interface to this operation. How to specify which indirect block I mean? Obviously we could introduce separate call for remapping indirect blocks but I find this solution kind of clumsy... Agreed...

Re: [RFC] Ext3 online defrag

2006-10-25 Thread David Chinner
On Wed, Oct 25, 2006 at 01:00:52PM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 06:11:37PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 02:01:42AM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 03:38:23PM +1000, David Chinner wrote: On Wed, Oct 25, 2006 at 12:48:44AM

Re: [RFC] Ext3 online defrag

2006-10-25 Thread Theodore Tso
On Thu, Oct 26, 2006 at 11:40:20AM +1000, David Chinner wrote: We don't need to expose anything filesystem specific to userspace to implement this. Online data movement (i.e. the defrag mechanism) becomes something like: do { get_free_list(dst_fd, location, len, list)

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Tue, Oct 24, 2006 at 12:14:33AM -0400, Jeff Garzik wrote: On Mon, Oct 23, 2006 at 06:31:40PM +0400, Alex Tomas wrote: isn't that a kernel responsbility to find/allocate target blocks? wouldn't it better to specify desirable target group and minimal acceptable chunk of free blocks? The

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Eric Sandeen
David Chinner wrote: The allocation interface, OTOH, is anything but simple and is really a filesystem specific interface. Seems logical to me to separate the two. And ext[234] preallocation would be a very nice feature in its own right. -Eric - To unsubscribe from this list: send the line

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote: On Tue, 2006-10-24 at 23:59 +1000, David Chinner wrote: On Tue, Oct 24, 2006 at 12:14:33AM -0400, Jeff Garzik wrote: On Mon, Oct 23, 2006 at 06:31:40PM +0400, Alex Tomas wrote: isn't that a kernel responsbility to

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Dave Kleikamp
On Wed, 2006-10-25 at 02:01 +1000, David Chinner wrote: On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote: On Tue, 2006-10-24 at 23:59 +1000, David Chinner wrote: That's the wrong way to look at it. if you want the userspace process to specify a location, then you should

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Theodore Tso
On Tue, Oct 24, 2006 at 11:59:28PM +1000, David Chinner wrote: That's the wrong way to look at it. if you want the userspace process to specify a location, then you should preallocate it first before doing anything else. There is no need to clutter a simple data mover interface with all sorts

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Russell Cattelan
On Tue, 2006-10-24 at 15:44 -0400, Theodore Tso wrote: On Tue, Oct 24, 2006 at 11:59:28PM +1000, David Chinner wrote: That's the wrong way to look at it. if you want the userspace process to specify a location, then you should preallocate it first before doing anything else. There is no

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Tue, Oct 24, 2006 at 11:26:26AM -0500, Dave Kleikamp wrote: On Wed, 2006-10-25 at 02:01 +1000, David Chinner wrote: On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote: On Tue, 2006-10-24 at 23:59 +1000, David Chinner wrote: That's the wrong way to look at it. if you want

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Tue, Oct 24, 2006 at 03:44:16PM -0400, Theodore Tso wrote: On Tue, Oct 24, 2006 at 11:59:28PM +1000, David Chinner wrote: That's the wrong way to look at it. if you want the userspace process to specify a location, then you should preallocate it first before doing anything else. There is

RE: [RFC] Ext3 online defrag

2006-10-24 Thread Barry Naujok
On Wed, 25 Oct 2006 11:19 AM, David Chinner wrote: On Tue, Oct 24, 2006 at 11:26:26AM -0500, Dave Kleikamp wrote: On Wed, 2006-10-25 at 02:01 +1000, David Chinner wrote: On Tue, Oct 24, 2006 at 09:51:41AM -0500, Dave Kleikamp wrote: The allocation interface needs to be be able to be

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 12:30:02PM +1000, Barry Naujok wrote: Could we have a more abstract method for asking the filesystem where the free blocks are and then using the same block addressing to tell the fs where to allocate/move the file's data to? That's fundamentally racy, so you might as

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Tue, Oct 24, 2006 at 10:42:57PM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 12:30:02PM +1000, Barry Naujok wrote: Could we have a more abstract method for asking the filesystem where the free blocks are and then using the same block addressing to tell the fs where to allocate/move

Re: [RFC] Ext3 online defrag

2006-10-24 Thread Jeff Garzik
On Wed, Oct 25, 2006 at 02:27:53PM +1000, David Chinner wrote: But it a race that is _easily_ handled, and applications only need to implement one interface, not a different method for every filesystem that requires deeep filesystem knowledge. Besides, you still have to handle the case where

Re: [RFC] Ext3 online defrag

2006-10-24 Thread David Chinner
On Wed, Oct 25, 2006 at 12:48:44AM -0400, Jeff Garzik wrote: On Wed, Oct 25, 2006 at 02:27:53PM +1000, David Chinner wrote: But it a race that is _easily_ handled, and applications only need to implement one interface, not a different method for every filesystem that requires deeep

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Theodore Tso
On Mon, Oct 23, 2006 at 02:27:10PM +0200, Jan Kara wrote: Hello, I've written a simple patch implementing ext3 ioctl for file relocation. Basically you call ioctl on a file, give it list of blocks and it relocates the file into given blocks (provided they are still free). The idea is to

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jan Kara
Hello, I've written a simple patch implementing ext3 ioctl for file relocation. Basically you call ioctl on a file, give it list of blocks and it relocates the file into given blocks (provided they are still free). The idea is to use it as a kernel part of ext3 online defragmenter

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Andreas Dilger
On Oct 23, 2006 18:31 +0400, Alex Tomas wrote: isn't that a kernel responsbility to find/allocate target blocks? wouldn't it better to specify desirable target group and minimal acceptable chunk of free blocks? In some cases this is useful (e.g. if file has small fragments after being written

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jan Kara
Theodore Tso (TT) writes: TT On Mon, Oct 23, 2006 at 02:27:10PM +0200, Jan Kara wrote: Hello, I've written a simple patch implementing ext3 ioctl for file relocation. Basically you call ioctl on a file, give it list of blocks and it relocates the file into given blocks

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jan Kara
On Oct 23, 2006 18:31 +0400, Alex Tomas wrote: I would make this interface optionally allow the target extent to be specified, but if target block == 0 then the kernel is free to do its own allocation. That's a good idea! I'll change the handling so that if block==0 we just allocate blocks

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Eric Sandeen
Alex Tomas wrote: Theodore Tso (TT) writes: TT On Mon, Oct 23, 2006 at 02:27:10PM +0200, Jan Kara wrote: Hello, I've written a simple patch implementing ext3 ioctl for file relocation. Basically you call ioctl on a file, give it list of blocks and it relocates the file into given

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Andreas Dilger
On Oct 23, 2006 10:16 -0400, Theodore Tso wrote: As a suggestion, I would pass the inode number and inode generation number into the ext3_file_mode_data array: struct ext3_file_move_data { int extents; struct ext3_reloc_extent __user *ext_array; }; This will be much more

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jan Kara
On Oct 23, 2006 10:16 -0400, Theodore Tso wrote: As a suggestion, I would pass the inode number and inode generation number into the ext3_file_mode_data array: struct ext3_file_move_data { int extents; struct ext3_reloc_extent __user *ext_array; }; This will be much

Re: [RFC] Ext3 online defrag

2006-10-23 Thread Jeff Garzik
On Mon, Oct 23, 2006 at 06:31:40PM +0400, Alex Tomas wrote: isn't that a kernel responsbility to find/allocate target blocks? wouldn't it better to specify desirable target group and minimal acceptable chunk of free blocks? The kernel doesn't have enough knowledge to know whether or not the