Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Wed, May 23, 2001 at 11:12:00AM -0700, Linus Torvalds wrote: > > On Wed, 23 May 2001, Stephen C. Tweedie wrote: > No, you can actually do all the "prepare_write()"/"commit_write()" stuff > that the filesystems already do. And you can do it a lot _better_

Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 07:36:07PM -0700, Linus Torvalds wrote: > Right now we don't try to aggressively drop streaming pages, but it's > possible. Using raw devices is a silly work-around that should not be > needed, and this load shows a real problem in current Linux (one soon to > be

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Tue, May 15, 2001 at 04:37:01PM +1200, Chris Wedgwood wrote: > On Sun, May 13, 2001 at 08:39:23PM -0600, Richard Gooch wrote: > > Yeah, we need a decent unfragmenter. We can do that now with > bmap(). > > SCT wrote a defragger for ext2 but it only handles 1k blocks :( Actually,

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 09:55:14AM +0200, Rogier Wolff wrote: > The "boot quickly" was an example. "Load netscape quickly" on some > systems is done by dd-ing the binary to /dev/null. This is one of the reasons why some filesystems use extent maps instead of inode indirection trees. The

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 12:47:15PM -0700, Linus Torvalds wrote: > > On Sat, 19 May 2001, Pavel Machek wrote: > > > > > Don't get _too_ hung up about the power-management kind of "invisible > > > suspend/resume" sequence where you resume the whole kernel state. > > > > Ugh. Now I'm

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 12:47:15PM -0700, Linus Torvalds wrote: On Sat, 19 May 2001, Pavel Machek wrote: Don't get _too_ hung up about the power-management kind of invisible suspend/resume sequence where you resume the whole kernel state. Ugh. Now I'm confused. How do you do

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Tue, May 15, 2001 at 04:37:01PM +1200, Chris Wedgwood wrote: On Sun, May 13, 2001 at 08:39:23PM -0600, Richard Gooch wrote: Yeah, we need a decent unfragmenter. We can do that now with bmap(). SCT wrote a defragger for ext2 but it only handles 1k blocks :( Actually, I

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 09:55:14AM +0200, Rogier Wolff wrote: The boot quickly was an example. Load netscape quickly on some systems is done by dd-ing the binary to /dev/null. This is one of the reasons why some filesystems use extent maps instead of inode indirection trees. The

Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 07:36:07PM -0700, Linus Torvalds wrote: Right now we don't try to aggressively drop streaming pages, but it's possible. Using raw devices is a silly work-around that should not be needed, and this load shows a real problem in current Linux (one soon to be fixed,

Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Wed, May 23, 2001 at 11:12:00AM -0700, Linus Torvalds wrote: On Wed, 23 May 2001, Stephen C. Tweedie wrote: No, you can actually do all the prepare_write()/commit_write() stuff that the filesystems already do. And you can do it a lot _better_ than the current buffer-cache-based

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 11:54:55AM -0500, Oliver Xymoron wrote: > > > > That's probably the right thing to add. > > > > > > I'd vote for an async flag instead. > > > > Why??? Why change the default behaviour to be something much slower? > > I was suggesting an async flag _in addition_ to

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 10:50:51AM -0500, Oliver Xymoron wrote: > On Mon, 21 May 2001, Theodore Tso wrote: > > > On Mon, May 21, 2001 at 06:47:58PM +0100, Stephen C. Tweedie wrote: > > > > > Just set chattr +S on the spool dir. That's what the flag is for

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 11:54:55AM -0500, Oliver Xymoron wrote: That's probably the right thing to add. I'd vote for an async flag instead. Why??? Why change the default behaviour to be something much slower? I was suggesting an async flag _in addition_ to the sync flag,

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 10:50:51AM -0500, Oliver Xymoron wrote: On Mon, 21 May 2001, Theodore Tso wrote: On Mon, May 21, 2001 at 06:47:58PM +0100, Stephen C. Tweedie wrote: Just set chattr +S on the spool dir. That's what the flag is for. The biggest problem

Re: Ext2, fsync() and MTA's?

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 13, 2001 at 12:53:37AM +1000, Andrew McNamara wrote: > I seem to recall that in 2.2, fsync behaved like fdatasync, and that > it's only in 2.4 that it also syncs metadata - is this correct? No, fsync should be safe on 2.2. There was a problem with O_SYNC not syncing all

Re: Ext2, fsync() and MTA's?

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sat, May 12, 2001 at 03:13:55PM +0100, Alan Cox wrote: > fsync guarantees the inode data is up to date, fdatasync just the data. fdatasync guarantees "important" inode data too. The only thing that fdatasync is allowed to skip is the timestamps. --Stephen - To unsubscribe from this

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote: > On Sun, 20 May 2001, Mike Galbraith wrote: > > > > Looking at the locking and trying to think SMP (grunt) though, I > > don't like the thought of taking two locks for each page until > > > 100%. The data in that block is

Re: LANANA: To Pending Device Number Registrants

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 04:20:11PM -0400, Michael Meissner wrote: > On Fri, May 18, 2001 at 03:17:50PM +0100, Stephen C. Tweedie wrote: > Presumably, a new UUID is created each time format a partition, which means it > is a slight bit of hassle if you have to reload a partition fr

Re: LANANA: To Pending Device Number Registrants

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 04:20:11PM -0400, Michael Meissner wrote: On Fri, May 18, 2001 at 03:17:50PM +0100, Stephen C. Tweedie wrote: Presumably, a new UUID is created each time format a partition, which means it is a slight bit of hassle if you have to reload a partition from a dump

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until 100%. The data in that block is toast anyway.

Re: Ext2, fsync() and MTA's?

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 13, 2001 at 12:53:37AM +1000, Andrew McNamara wrote: I seem to recall that in 2.2, fsync behaved like fdatasync, and that it's only in 2.4 that it also syncs metadata - is this correct? No, fsync should be safe on 2.2. There was a problem with O_SYNC not syncing all metadata

Re: LANANA: To Pending Device Number Registrants

2001-05-19 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 05:29:32PM +1200, Chris Wedgwood wrote: > > Or you can fall back to mounting by UUID, which is globally > unique and still avoids referencing physical location. You also > don't need to manually set LABELs for UUID to work: all e2fsprogs > over the

Re: LANANA: To Pending Device Number Registrants

2001-05-19 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 05:29:32PM +1200, Chris Wedgwood wrote: Or you can fall back to mounting by UUID, which is globally unique and still avoids referencing physical location. You also don't need to manually set LABELs for UUID to work: all e2fsprogs over the past

Re: Linux 2.4.4-ac10

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 07:44:39PM -0300, Rik van Riel wrote: > This is the core of why we cannot (IMHO) have a discussion > of whether a patch introducing new VM tunables can go in: > there is no clear overview of exactly what would need to be > tunable and how it would help. It's worse

Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 11, 2001 at 04:54:44PM +0200, Daniel Phillips wrote: > The only reasonable way I can think of getting a block-coherent view > underneath a mounted fs is to have a reverse map, and update it each > time we map block into the page cache or unmap it. It's called the "buffer

Re: LANANA: To Pending Device Number Registrants

2001-05-18 Thread Stephen C. Tweedie
Hi, On Wed, May 16, 2001 at 12:18:15PM -0400, Michael Meissner wrote: > With the current LABEL= support, you won't be able to mount the disks with > duplicate labels, but you can still mount them via /dev/sd. Or you can fall back to mounting by UUID, which is globally unique and still avoids

Re: LANANA: To Pending Device Number Registrants

2001-05-18 Thread Stephen C. Tweedie
Hi, On Wed, May 16, 2001 at 12:18:15PM -0400, Michael Meissner wrote: With the current LABEL= support, you won't be able to mount the disks with duplicate labels, but you can still mount them via /dev/sdxxx. Or you can fall back to mounting by UUID, which is globally unique and still avoids

Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 11, 2001 at 04:54:44PM +0200, Daniel Phillips wrote: The only reasonable way I can think of getting a block-coherent view underneath a mounted fs is to have a reverse map, and update it each time we map block into the page cache or unmap it. It's called the buffer cache,

Re: Linux 2.4.4-ac10

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 07:44:39PM -0300, Rik van Riel wrote: This is the core of why we cannot (IMHO) have a discussion of whether a patch introducing new VM tunables can go in: there is no clear overview of exactly what would need to be tunable and how it would help. It's worse than

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote: > Back to the main discussion --- I guess we could make __GFP_FAIL (with > __GFP_WAIT set :)) allocations actually fail if "try_to_free_pages()" does > not make any progress (ie returns zero). But maybe thats a bit too >

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote: > Initially I thought about __GFP_FAIL to be used by writeout routines which > want to cluster pages until they can allocate memory without causing any > pressure to the system. Something like this: > > while ((page =

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote: > No. __GFP_FAIL can to try to reclaim pages from inactive clean. > > We just want to avoid __GFP_FAIL allocations from going to > try_to_free_pages(). Why? __GFP_FAIL is only useful as an indication that the caller has

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote: No. __GFP_FAIL can to try to reclaim pages from inactive clean. We just want to avoid __GFP_FAIL allocations from going to try_to_free_pages(). Why? __GFP_FAIL is only useful as an indication that the caller has some

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote: Initially I thought about __GFP_FAIL to be used by writeout routines which want to cluster pages until they can allocate memory without causing any pressure to the system. Something like this: while ((page =

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote: Back to the main discussion --- I guess we could make __GFP_FAIL (with __GFP_WAIT set :)) allocations actually fail if try_to_free_pages() does not make any progress (ie returns zero). But maybe thats a bit too extreme.

Re: Swap space deallocation speed. (fwd)

2001-05-04 Thread Stephen C. Tweedie
Hi, On Thu, May 03, 2001 at 12:03:39AM -0400, Dave Mielke wrote: > unresponsive. The relevant line in the log, as you can find in the attached > "crash.log" file, appears to be: > > Unable to handle kernel paging request at virtual address 00020024 > Apr 16 11:23:06 dave kernel: esi:

Re: Swap space deallocation speed. (fwd)

2001-05-04 Thread Stephen C. Tweedie
Hi, On Thu, May 03, 2001 at 12:03:39AM -0400, Dave Mielke wrote: unresponsive. The relevant line in the log, as you can find in the attached crash.log file, appears to be: Unable to handle kernel paging request at virtual address 00020024 Apr 16 11:23:06 dave kernel: esi: 0002

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Wed, May 02, 2001 at 01:49:16PM +0100, Hugh Dickins wrote: > On Wed, 2 May 2001, Stephen C. Tweedie wrote: > > > > So the aim is more complex. Basically, once we are short on VM, we > > want to eliminate redundant copies of swap data. That implies two > >

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Wed, May 02, 2001 at 12:54:15PM +0200, Rogier Wolff wrote: > > first: Thanks for clearing this up for me. > > So, there are in fact some more "states" a swap-page can be in: > > -(0) free > -(1) allocated, not in mem. > -(2) on swap, valid copy of memory. >

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Tue, May 01, 2001 at 06:14:54PM +0200, Rogier Wolff wrote: > Shouldn't the algorithm be: > > - If (current_access == write ) > free (swap_page); > else > map (page, READONLY) > > and > when a write access happens, we fault again, and map free the > swap-page as it

Re: [Patch] deadlock on write in tmpfs

2001-05-02 Thread Stephen C. Tweedie
hi, On Tue, May 01, 2001 at 03:39:47PM +0200, Christoph Rohland wrote: > > tmpfs deadlocks when writing into a file from a mapping of the same > file. > > So I see two choices: > > 1) Do not serialise the whole of shmem_getpage_locked but protect >critical pathes with the spinlock and

Re: [Patch] deadlock on write in tmpfs

2001-05-02 Thread Stephen C. Tweedie
hi, On Tue, May 01, 2001 at 03:39:47PM +0200, Christoph Rohland wrote: tmpfs deadlocks when writing into a file from a mapping of the same file. So I see two choices: 1) Do not serialise the whole of shmem_getpage_locked but protect critical pathes with the spinlock and do

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Tue, May 01, 2001 at 06:14:54PM +0200, Rogier Wolff wrote: Shouldn't the algorithm be: - If (current_access == write ) free (swap_page); else map (page, READONLY) and when a write access happens, we fault again, and map free the swap-page as it is now

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Wed, May 02, 2001 at 12:54:15PM +0200, Rogier Wolff wrote: first: Thanks for clearing this up for me. So, there are in fact some more states a swap-page can be in: -(0) free -(1) allocated, not in mem. -(2) on swap, valid copy of memory. -(3) on

Re: 2.4 and 2GB swap partition limit

2001-05-02 Thread Stephen C. Tweedie
Hi, On Wed, May 02, 2001 at 01:49:16PM +0100, Hugh Dickins wrote: On Wed, 2 May 2001, Stephen C. Tweedie wrote: So the aim is more complex. Basically, once we are short on VM, we want to eliminate redundant copies of swap data. That implies two possible actions, not one --- we can

Re: 2.4 and 2GB swap partition limit

2001-05-01 Thread Stephen C. Tweedie
Hi, On Mon, Apr 30, 2001 at 07:12:12PM +0100, Alan Cox wrote: > > paging in just released 2.4.4, but in previuos kernel, a page that was > > paged-out, reserves its place in swap even if it is paged-in again, so > > once you have paged-out all your ram at least once, you can't get any > > more

Re: 2.4 and 2GB swap partition limit

2001-05-01 Thread Stephen C. Tweedie
Hi, On Mon, Apr 30, 2001 at 07:12:12PM +0100, Alan Cox wrote: paging in just released 2.4.4, but in previuos kernel, a page that was paged-out, reserves its place in swap even if it is paged-in again, so once you have paged-out all your ram at least once, you can't get any more memory,

Re: generic_osync_inode/ext2_fsync_inode still not safe

2001-04-20 Thread Stephen C. Tweedie
Hi, On Wed, Apr 18, 2001 at 06:45:40AM -0300, Marcelo Tosatti wrote: > As far as I can see, you cannot guarantee that an inode which is unlocked > _and_ clean (accordingly to the inode->i_state) is safely on disk. > > The reason for that are calls to sync_one() which write the inode >

Re: RFC: pageable kernel-segments

2001-04-20 Thread Stephen C. Tweedie
Hi, On Fri, Apr 20, 2001 at 03:49:30PM +0100, Alan Cox wrote: > There is a proposal (several it seems) to make 2.5 replace the conventional > unix swap with a filesystem of backing store for anonymous objects. That will > mean each object has its own vm area and inode and thus we can start

Re: RFC: pageable kernel-segments

2001-04-20 Thread Stephen C. Tweedie
Hi, On Tue, Apr 17, 2001 at 12:21:17PM -0700, H. Peter Anvin wrote: > > Certain parts of drivers could get the __pageable prefix or so > > (like the __init parts of drivers which get removed) for letting > > the paging-code know that it can be discared if memory-pressure > > demands it. > >

Re: Asynchronous IO

2001-04-20 Thread Stephen C. Tweedie
Hi, On Fri, Apr 13, 2001 at 04:45:07AM -0400, Dan Maas wrote: > IIRC the problem with implementing asynchronous *disk* I/O in Linux today is > that the filesystem code assumes synchronous I/O operations that block the > whole process/thread. So implementing "real" asynch I/O (without the >

Re: Asynchronous IO

2001-04-20 Thread Stephen C. Tweedie
Hi, On Fri, Apr 13, 2001 at 04:45:07AM -0400, Dan Maas wrote: IIRC the problem with implementing asynchronous *disk* I/O in Linux today is that the filesystem code assumes synchronous I/O operations that block the whole process/thread. So implementing "real" asynch I/O (without the overhead

Re: RFC: pageable kernel-segments

2001-04-20 Thread Stephen C. Tweedie
Hi, On Tue, Apr 17, 2001 at 12:21:17PM -0700, H. Peter Anvin wrote: Certain parts of drivers could get the __pageable prefix or so (like the __init parts of drivers which get removed) for letting the paging-code know that it can be discared if memory-pressure demands it. VMS does

Re: RFC: pageable kernel-segments

2001-04-20 Thread Stephen C. Tweedie
Hi, On Fri, Apr 20, 2001 at 03:49:30PM +0100, Alan Cox wrote: There is a proposal (several it seems) to make 2.5 replace the conventional unix swap with a filesystem of backing store for anonymous objects. That will mean each object has its own vm area and inode and thus we can start blowing

Re: generic_osync_inode/ext2_fsync_inode still not safe

2001-04-20 Thread Stephen C. Tweedie
Hi, On Wed, Apr 18, 2001 at 06:45:40AM -0300, Marcelo Tosatti wrote: As far as I can see, you cannot guarantee that an inode which is unlocked _and_ clean (accordingly to the inode-i_state) is safely on disk. The reason for that are calls to sync_one() which write the inode

Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3

2001-04-17 Thread Stephen C. Tweedie
Hi, On Sat, Apr 14, 2001 at 08:31:07PM -0300, Marcelo Tosatti wrote: > On Sat, 14 Apr 2001, Rik van Riel wrote: > > On Sat, 14 Apr 2001, Marcelo Tosatti wrote: > > > > > There is a nasty race between shmem_getpage_locked() and > > > swapin_readahead() with the new shmem code (introduced in > >

Re: generic_osync_inode/ext2_fsync_inode still not safe

2001-04-17 Thread Stephen C. Tweedie
Hi, On Sat, Apr 14, 2001 at 07:24:42AM -0300, Marcelo Tosatti wrote: > > As described earlier, code which wants to write an inode cannot rely on > the I_DIRTY bits (on inode->i_state) being clean to guarantee that the > inode and its dirty pages, if any, are safely synced on disk. Indeed ---

Re: generic_osync_inode/ext2_fsync_inode still not safe

2001-04-17 Thread Stephen C. Tweedie
Hi, On Sat, Apr 14, 2001 at 07:24:42AM -0300, Marcelo Tosatti wrote: As described earlier, code which wants to write an inode cannot rely on the I_DIRTY bits (on inode-i_state) being clean to guarantee that the inode and its dirty pages, if any, are safely synced on disk. Indeed --- for

Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-25 Thread Stephen C. Tweedie
Hi, On Sat, Mar 24, 2001 at 10:05:18PM -0300, Rik van Riel wrote: > On Sun, 25 Mar 2001, Stephen C. Tweedie wrote: > > > Rik, do you think it is really necessary to take the page lock and > > release it inside lookup_swap_cache? I may be overlooking something, > > but

Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-25 Thread Stephen C. Tweedie
Hi, On Sat, Mar 24, 2001 at 10:05:18PM -0300, Rik van Riel wrote: On Sun, 25 Mar 2001, Stephen C. Tweedie wrote: Rik, do you think it is really necessary to take the page lock and release it inside lookup_swap_cache? I may be overlooking something, but I can't see the benefit

[PATCH] 2.4.2-ac24 buffer.c oops on highmem

2001-03-24 Thread Stephen C. Tweedie
Hi, We've just seen a buffer.c oops in: >>EIP; c013ae4b <__block_prepare_write+2bb/300> <= Trace; c013b732 Trace; c015dbba Trace; c012a67e Trace; c015dbba Trace; c01281c0 Trace; c01384a6 Trace; c010910b __block_prepare_write()'s "out:" error handler tries to do a

Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Stephen C. Tweedie
Hi, On Fri, Mar 23, 2001 at 11:58:50AM -0800, Linus Torvalds wrote: > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea. Uggh --- the shmem code already does, see: shmem_truncate->shmem_truncate_part->shmem_free_swp-> lookup_swap_cache->find_lock_page It looks messy:

Re: [PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-24 Thread Stephen C. Tweedie
Hi, On Fri, Mar 23, 2001 at 11:58:50AM -0800, Linus Torvalds wrote: Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea. Uggh --- the shmem code already does, see: shmem_truncate-shmem_truncate_part-shmem_free_swp- lookup_swap_cache-find_lock_page It looks messy:

[PATCH] 2.4.2-ac24 buffer.c oops on highmem

2001-03-24 Thread Stephen C. Tweedie
Hi, We've just seen a buffer.c oops in: EIP; c013ae4b __block_prepare_write+2bb/300 = Trace; c013b732 block_prepare_write+22/70 Trace; c015dbba ext2_get_block+a/4e0 Trace; c012a67e generic_file_write+3ee/710 Trace; c015dbba ext2_get_block+a/4e0 Trace; c01281c0 file_read_actor+0/f0 Trace;

Re: [linux-lvm] EXT2-fs panic (device lvm(58,0)):

2001-03-22 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 01:35:05PM -0700, Andreas Dilger wrote: > The only remote possibility is in ext2_free_blocks() if block+count > overflows a 32-bit unsigned value. Only 2 places call ext2_free_blocks() > with a count != 1, and ext2_free_data() looks to be OK. The other >

[PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-22 Thread Stephen C. Tweedie
Hi, The patch below is for two races in sysV shared memory. The first (minor) one is in shmem_free_swp: swap_free (entry); *ptr = (swp_entry_t){0}; freed++; if (!(page = lookup_swap_cache(entry))) continue;

Re: 2.4.2 fs/inode.c

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 01:42:15PM -0500, Jan Harkes wrote: > > I found some code that seems wrong and didn't even match it's comment. > Patch is against 2.4.2, but should go cleanly against 2.4.3-pre6 as well. Patch looks fine to me. Have you tested it? If this goes wrong, things break

Re: Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 09:36:48AM -0800, Linus Torvalds wrote: > On Thu, 22 Mar 2001, Stephen C. Tweedie wrote: > > > > There is what appears to be a simple thinko in kswapd. We really > > ought to keep kswapd running as long as there is either a free space &g

Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, There is what appears to be a simple thinko in kswapd. We really ought to keep kswapd running as long as there is either a free space or an inactive page shortfall; but right now we only keep going if _both_ are short. Diff below. With this change, I've got a 64MB box running Applix and

Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, There is what appears to be a simple thinko in kswapd. We really ought to keep kswapd running as long as there is either a free space or an inactive page shortfall; but right now we only keep going if _both_ are short. Diff below. With this change, I've got a 64MB box running Applix and

Re: Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 09:36:48AM -0800, Linus Torvalds wrote: On Thu, 22 Mar 2001, Stephen C. Tweedie wrote: There is what appears to be a simple thinko in kswapd. We really ought to keep kswapd running as long as there is either a free space or an inactive page shortfall

Re: 2.4.2 fs/inode.c

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 01:42:15PM -0500, Jan Harkes wrote: I found some code that seems wrong and didn't even match it's comment. Patch is against 2.4.2, but should go cleanly against 2.4.3-pre6 as well. Patch looks fine to me. Have you tested it? If this goes wrong, things break

[PATCH] Fix races in 2.4.2-ac22 SysV shared memory

2001-03-22 Thread Stephen C. Tweedie
Hi, The patch below is for two races in sysV shared memory. The first (minor) one is in shmem_free_swp: swap_free (entry); *ptr = (swp_entry_t){0}; freed++; if (!(page = lookup_swap_cache(entry))) continue;

Re: [linux-lvm] EXT2-fs panic (device lvm(58,0)):

2001-03-22 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 01:35:05PM -0700, Andreas Dilger wrote: The only remote possibility is in ext2_free_blocks() if block+count overflows a 32-bit unsigned value. Only 2 places call ext2_free_blocks() with a count != 1, and ext2_free_data() looks to be OK. The other possibility is

Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-19 Thread Stephen C. Tweedie
Hi, On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote: > > The problem is that mmap_sem seems to be protecting the list > > of VMAs, so taking _only_ the page_table_lock could let a VMA > > change under us while a page fault is underway ... > > No, that can't happen. It can.

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-19 Thread Stephen C. Tweedie
Hi, On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote: The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... No, that can't happen. It can. Page faults

Re: [PATCH]: Only one memory zone for sparc64

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 07:13:52PM +1100, Anton Blanchard wrote: > > On sparc64 we dont care about the different memory zones and iterating > through them all over the place only serves to waste CPU. I suspect this > would be the case with some other architectures but for the moment I >

Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote: > On Fri, 16 Mar 2001, Stephen C. Tweedie wrote: > > > > Write locks would be used in the code where we actually want > > > to change the VMA list and page faults would use an extra lock > > &

Re: O_DSYNC flag for open

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 10:26:42PM -0500, Tom Vier wrote: > fdatasync() is the same as fsync(), in linux. No, in 2.4 fdatasync does the right thing and skips the inode flush if only the timestamps have changed. > until fdatasync() is > implimented (ie, syncs the data only) fdatasync is

Re: changing mm->mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote: > On Wed, 14 Mar 2001, Rik van Riel wrote: > The mmap_sem is used in procfs to prevent the list of VMAs > from changing. In the page fault code it seems to be used > to prevent other page faults to happen at the same time with >

Re: magic device renumbering was -- Re: Linux 2.4.2ac20

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 02:11:57PM -0500, Lars Kellogg-Stedman wrote: > > Put LABEL= in you fstab in place of the device name. > > Which is great, for filesystems that support labels. Unfortunately, > this isn't universally available -- for instance, you cannot mount > a swap partition by

Re: magic device renumbering was -- Re: Linux 2.4.2ac20

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 02:11:57PM -0500, Lars Kellogg-Stedman wrote: Put LABEL=label set with e2label in you fstab in place of the device name. Which is great, for filesystems that support labels. Unfortunately, this isn't universally available -- for instance, you cannot mount a

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote: On Wed, 14 Mar 2001, Rik van Riel wrote: The mmap_sem is used in procfs to prevent the list of VMAs from changing. In the page fault code it seems to be used to prevent other page faults to happen at the same time with the

Re: O_DSYNC flag for open

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 10:26:42PM -0500, Tom Vier wrote: fdatasync() is the same as fsync(), in linux. No, in 2.4 fdatasync does the right thing and skips the inode flush if only the timestamps have changed. until fdatasync() is implimented (ie, syncs the data only) fdatasync is

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote: On Fri, 16 Mar 2001, Stephen C. Tweedie wrote: Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per

Re: [PATCH]: Only one memory zone for sparc64

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 07:13:52PM +1100, Anton Blanchard wrote: On sparc64 we dont care about the different memory zones and iterating through them all over the place only serves to waste CPU. I suspect this would be the case with some other architectures but for the moment I have

Re: BUG? race between kswapd and ptrace (access_process_vm )

2001-03-12 Thread Stephen C. Tweedie
Hi, On Thu, Mar 08, 2001 at 09:12:52PM +0100, Manfred Spraul wrote: > > > Fixing the bug is more difficult than I thought: > > Initially I assumed it would be a two-liner (lock, unlock) but kmap() > can sleep. > > Can I reuse a kmap_atomic() type or should I add a new type? I've just tried

Re: BUG? race between kswapd and ptrace (access_process_vm )

2001-03-12 Thread Stephen C. Tweedie
Hi, On Thu, Mar 08, 2001 at 09:12:52PM +0100, Manfred Spraul wrote: Fixing the bug is more difficult than I thought: Initially I assumed it would be a two-liner (lock, unlock) but kmap() can sleep. Can I reuse a kmap_atomic() type or should I add a new type? I've just tried with the

Re: 64-bit capable block device layer

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:53:23PM +0100, Jens Axboe wrote: > > > > OTOH, I'm not sure what problems it could give to make this > > a compile-time option... > > Plus compile time options are nasty :-). It would probably make > bigger sense to completely skip all the merging etc for low end

Re: scsi vs ide performance on fsync's

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 10:36:38AM -0800, Linus Torvalds wrote: > On Wed, 7 Mar 2001, Jeremy Hansen wrote: > > > > So in the meantime as this gets worked out on a lower level, we've decided > > to take the fsync() out of berkeley db for mysql transaction logs and > > mount the filesystem -o

Re: scsi vs ide performance on fsync's

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 10:36:38AM -0800, Linus Torvalds wrote: On Wed, 7 Mar 2001, Jeremy Hansen wrote: So in the meantime as this gets worked out on a lower level, we've decided to take the fsync() out of berkeley db for mysql transaction logs and mount the filesystem -o sync.

Re: 64-bit capable block device layer

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:53:23PM +0100, Jens Axboe wrote: OTOH, I'm not sure what problems it could give to make this a compile-time option... Plus compile time options are nasty :-). It would probably make bigger sense to completely skip all the merging etc for low end

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote: > On Wed, Mar 07 2001, Stephen C. Tweedie wrote: > > > > For most fs'es, that's not an issue. The fs won't start writeback on > > the primary disk at all until the journal commit has been acknowledge

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote: > On Wed, Mar 07 2001, Stephen C. Tweedie wrote: > > My bigger concern is when the journalled fs has a log on a different > queue. For most fs'es, that's not an issue. The fs won't start writeback on the primary disk

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 03:12:41PM +0100, Jens Axboe wrote: > > Yep, it's much harder than it seems. Especially because for the barrier > to be really useful, having inter-request dependencies becomes a > requirement. So you can say something like 'flush X and Y, but don't > flush Y before

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Tue, Mar 06, 2001 at 09:37:20PM +0100, Jens Axboe wrote: > > SCSI has ordered tag, which fit the model Alan described quite nicely. > I've been meaning to implement this for some time, it would be handy > for journalled fs to use such a barrier. Since ATA doesn't do queueing > (at least

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote: > On Tue, 6 Mar 2001, Alan Cox wrote: > > You want a write barrier. Write buffering (at least for short intervals) in > > the drive is very sensible. The kernel needs to able to send drivers a write > > barrier which will not

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote: On Tue, 6 Mar 2001, Alan Cox wrote: You want a write barrier. Write buffering (at least for short intervals) in the drive is very sensible. The kernel needs to able to send drivers a write barrier which will not be

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 03:12:41PM +0100, Jens Axboe wrote: Yep, it's much harder than it seems. Especially because for the barrier to be really useful, having inter-request dependencies becomes a requirement. So you can say something like 'flush X and Y, but don't flush Y before X is

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote: On Wed, Mar 07 2001, Stephen C. Tweedie wrote: My bigger concern is when the journalled fs has a log on a different queue. For most fs'es, that's not an issue. The fs won't start writeback on the primary disk at all until

<    1   2   3   4   5   6   7   >