Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-10 Thread Stephen C. Tweedie
Hi, On Tue, Jan 09, 2001 at 02:25:43PM -0800, Linus Torvalds wrote: In article [EMAIL PROTECTED], Stephen C. Tweedie [EMAIL PROTECTED] wrote: Jes has also got hard numbers for the performance advantages of jumbograms on some of the networks he's been using, and you ain't going to get udp

Re: Subtle MM bug

2001-01-11 Thread Stephen C. Tweedie
Hi, On Wed, Jan 10, 2001 at 12:11:16PM -0800, Linus Torvalds wrote: That said, we can easily support the notion of CLONE_CRED if we absolutely have to (and sane people just shouldn't use it), so if somebody wants to work on this for 2.5.x... But is it really worth the pain? I'd hate to

Re: Subtle MM bug

2001-01-11 Thread Stephen C. Tweedie
Hi, On Thu, Jan 11, 2001 at 02:12:05PM +0100, Trond Myklebust wrote: What's wrong with copy-on-write style semantics? IOW, anyone who wants to change the credentials needs to make a private copy of the existing structure first. Because COW only solves the problem if each task is only

Re: Subtle MM bug

2001-01-11 Thread Stephen C. Tweedie
Hi, On Thu, Jan 11, 2001 at 11:50:21AM -0500, Albert D. Cahalan wrote: Stephen C. Tweedie writes: But is it really worth the pain? I'd hate to have to audit the entire VFS to make sure that it works if another thread changes our credentials in the middle of a syscall, so we either end

Re: Subtle MM bug

2001-01-11 Thread Stephen C. Tweedie
Hi, On Thu, Jan 11, 2001 at 02:03:48PM -0500, Alexander Viro wrote: On Thu, 11 Jan 2001, Stephen C. Tweedie wrote: On Thu, Jan 11, 2001 at 02:12:05PM +0100, Trond Myklebust wrote: What's wrong with copy-on-write style semantics? IOW, anyone who wants to change the credentials

Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1

2001-01-11 Thread Stephen C. Tweedie
Hi, On Tue, Jan 09, 2001 at 11:14:54AM -0800, Linus Torvalds wrote: In article [EMAIL PROTECTED], kiobufs are crap. Face it. They do NOT allow proper multi-page scatter gather, regardless of what the kiobuf PR department has said. It's not surprising, since they were designed to solve a

Re: inode-i_dirty_buffers redundant ?

2001-01-24 Thread Stephen C. Tweedie
Hi, On Wed, Jan 24, 2001 at 03:25:16PM +0530, V Ganesh wrote: now that we have inode-i_mapping-dirty_pages, what do we need inode-i_dirty_buffers for ? Metadata. Specifically, directory contents and indirection blocks. --Stephen - To unsubscribe from this list: send the line "unsubscribe

Re: ioremap_nocache problem?

2001-01-25 Thread Stephen C. Tweedie
Hi, On Tue, Jan 23, 2001 at 10:53:51AM -0600, Timur Tabi wrote: My problem is that it's very easy to map memory with ioremap_nocache, but if you use iounmap() the un-map it, the entire system will crash. No one has been able to explain that one to me, either. ioremap*() is only supposed

Re: Largefile support in 2.4

2001-01-25 Thread Stephen C. Tweedie
Hi, On Wed, Jan 24, 2001 at 02:38:00PM -0500, Mike Black wrote: How do normal users get to create/maintain large files (i.e. 2G) in Linux 2.4 on i386? The root user can make filesize unlimited but a non-root user cannot. They come up with the same limits in both tcsh and bash (i.e.

Re: limit on number of kmapped pages

2001-01-25 Thread Stephen C. Tweedie
Hi, On Wed, Jan 24, 2001 at 12:35:12AM +, David Wragg wrote: And why do the pages need to be kmapped? They only need to be kmapped while data is being copied into them. But you only need to kmap one page at a time during the copy. There is absolutely no need to copy the whole chunk

Re: ioremap_nocache problem?

2001-01-26 Thread Stephen C. Tweedie
Hi, On Thu, Jan 25, 2001 at 09:56:32AM -0600, Timur Tabi wrote: ioremap*() is only supposed to be used on IO regions or reserved pages. If you haven't marked the pages as reserved, then iounmap will do the wrong thing, so it's up to you to reserve the pages. Au contraire! I mark the

Re: ioremap_nocache problem?

2001-01-26 Thread Stephen C. Tweedie
Hi, On Thu, Jan 25, 2001 at 10:49:50AM -0600, Timur Tabi wrote: set_bit(PG_reserved, page-flags); ioremap(); ... iounmap(); clear_bit(PG_reserved, page-flags); The problem with this is that between the ioremap and iounmap, the page is reserved. What happens if

Re: ioremap_nocache problem?

2001-01-26 Thread Stephen C. Tweedie
Hi, On Thu, Jan 25, 2001 at 11:53:01AM -0600, Timur Tabi wrote: As in an MMIO aperture? If its MMIO on the bus you should be able to just call ioremap with the bus address. By nature of it being outside of real ram, it should automatically be uncached (unless you've set an MTRR

Re: inode-i_dirty_buffers redundant ?

2001-01-26 Thread Stephen C. Tweedie
Hi, On Thu, Jan 25, 2001 at 09:05:54PM +0100, Daniel Phillips wrote: "Stephen C. Tweedie" wrote: We also maintain the per-page buffer lists as caches of the virtual-to-physical mapping to avoid redundant bmap()ping. Could you clarify that one, please? The buffer contains

Re: inode-i_dirty_buffers redundant ?

2001-01-26 Thread Stephen C. Tweedie
Hi, On Thu, Jan 25, 2001 at 07:11:01PM -0200, Marcelo Tosatti wrote: We probably want another kind of "IO buffer" abstraction for 2.5 which can support buffer's bigger than PAGE_SIZE. Do you have any thoughts on that, Stephen? XFS is already doing this, with pagebufs being used in

Re: Renaming lost+found

2001-01-27 Thread Stephen C. Tweedie
Hi, On Fri, Jan 26, 2001 at 06:05:54PM -0200, Rodrigo Barbosa (aka morcego) wrote: I think JFS indeed doesn't have it. And ReiserFS doesn't too. This should be common place for journaling filesystems. No, it's nothing to do with journaling or not. Even journaling filesystems can suffer IO

Re: [PATCH] vma limited swapin readahead

2001-01-31 Thread Stephen C. Tweedie
Hi, On Wed, Jan 31, 2001 at 01:05:02AM -0200, Marcelo Tosatti wrote: However, the pages which are contiguous on swap are not necessarily contiguous in the virtual memory area where the fault happened. That means the swapin readahead code may read pages which are not related to the process

Re: [Kiobuf-io-devel] Re: RFC: Kernel mechanism: Compound event wait/notify + callback chains

2001-01-31 Thread Stephen C. Tweedie
Hi, On Wed, Jan 31, 2001 at 04:12:11PM +0530, [EMAIL PROTECTED] wrote: Thanks for mentioning this. I didn't know about it earlier. I've been going through the 4/00 kqueue patch on freebsd ... Linus has already denounced them as massively over-engineered... --Stephen - To unsubscribe

Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait/notify + callback chains

2001-01-31 Thread Stephen C. Tweedie
Hi, On Tue, Jan 30, 2001 at 10:15:02AM +0530, [EMAIL PROTECTED] wrote: Comments, suggestions, advise, feedback solicited ! My first comment is that this looks very heavyweight indeed. Isn't it just over-engineered? We _do_ need the ability to stack completion events, but as far as the

Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

2001-01-31 Thread Stephen C. Tweedie
Hi, On Wed, Jan 31, 2001 at 07:28:01PM +0530, [EMAIL PROTECTED] wrote: Do the following modifications to your wait queue extension sound reasonable ? 1. Change add_wait_queue to add elements to the end of queue (fifo, by default) and instead have an add_wait_queue_lifo() routine that

Re: [PATCH] vma limited swapin readahead

2001-02-01 Thread Stephen C. Tweedie
Hi, On Wed, Jan 31, 2001 at 04:24:24PM -0800, David Gould wrote: I am skeptical of the argument that we can win by replacing "the least desirable" pages with pages were even less desireable and that we have no recent indication of any need for. It seems possible under heavy swap to discard

Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

2001-02-01 Thread Stephen C. Tweedie
Hi, On Thu, Feb 01, 2001 at 10:25:22AM +0530, [EMAIL PROTECTED] wrote: We _do_ need the ability to stack completion events, but as far as the kiobuf work goes, my current thoughts are to do that by stacking lightweight "clone" kiobufs. Would that work with stackable filesystems ? Only

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote: No. __GFP_FAIL can to try to reclaim pages from inactive clean. We just want to avoid __GFP_FAIL allocations from going to try_to_free_pages(). Why? __GFP_FAIL is only useful as an indication that the caller has some

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote: Initially I thought about __GFP_FAIL to be used by writeout routines which want to cluster pages until they can allocate memory without causing any pressure to the system. Something like this: while ((page =

Re: [PATCH] allocation looping + kswapd CPU cycles

2001-05-10 Thread Stephen C. Tweedie
Hi, On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote: Back to the main discussion --- I guess we could make __GFP_FAIL (with __GFP_WAIT set :)) allocations actually fail if try_to_free_pages() does not make any progress (ie returns zero). But maybe thats a bit too extreme.

Re: LANANA: To Pending Device Number Registrants

2001-05-18 Thread Stephen C. Tweedie
Hi, On Wed, May 16, 2001 at 12:18:15PM -0400, Michael Meissner wrote: With the current LABEL= support, you won't be able to mount the disks with duplicate labels, but you can still mount them via /dev/sdxxx. Or you can fall back to mounting by UUID, which is globally unique and still avoids

Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 11, 2001 at 04:54:44PM +0200, Daniel Phillips wrote: The only reasonable way I can think of getting a block-coherent view underneath a mounted fs is to have a reverse map, and update it each time we map block into the page cache or unmap it. It's called the buffer cache,

Re: Linux 2.4.4-ac10

2001-05-18 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 07:44:39PM -0300, Rik van Riel wrote: This is the core of why we cannot (IMHO) have a discussion of whether a patch introducing new VM tunables can go in: there is no clear overview of exactly what would need to be tunable and how it would help. It's worse than

Re: LANANA: To Pending Device Number Registrants

2001-05-19 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 05:29:32PM +1200, Chris Wedgwood wrote: Or you can fall back to mounting by UUID, which is globally unique and still avoids referencing physical location. You also don't need to manually set LABELs for UUID to work: all e2fsprogs over the past

Re: LANANA: To Pending Device Number Registrants

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 04:20:11PM -0400, Michael Meissner wrote: On Fri, May 18, 2001 at 03:17:50PM +0100, Stephen C. Tweedie wrote: Presumably, a new UUID is created each time format a partition, which means it is a slight bit of hassle if you have to reload a partition from a dump

Re: [RFC][PATCH] Re: Linux 2.4.4-ac10

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 20, 2001 at 07:04:31AM -0300, Rik van Riel wrote: On Sun, 20 May 2001, Mike Galbraith wrote: Looking at the locking and trying to think SMP (grunt) though, I don't like the thought of taking two locks for each page until 100%. The data in that block is toast anyway.

Re: Ext2, fsync() and MTA's?

2001-05-21 Thread Stephen C. Tweedie
Hi, On Sun, May 13, 2001 at 12:53:37AM +1000, Andrew McNamara wrote: I seem to recall that in 2.2, fsync behaved like fdatasync, and that it's only in 2.4 that it also syncs metadata - is this correct? No, fsync should be safe on 2.2. There was a problem with O_SYNC not syncing all metadata

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 12:47:15PM -0700, Linus Torvalds wrote: On Sat, 19 May 2001, Pavel Machek wrote: Don't get _too_ hung up about the power-management kind of invisible suspend/resume sequence where you resume the whole kernel state. Ugh. Now I'm confused. How do you do

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Tue, May 15, 2001 at 04:37:01PM +1200, Chris Wedgwood wrote: On Sun, May 13, 2001 at 08:39:23PM -0600, Richard Gooch wrote: Yeah, we need a decent unfragmenter. We can do that now with bmap(). SCT wrote a defragger for ext2 but it only handles 1k blocks :( Actually, I

Re: Getting FS access events

2001-05-23 Thread Stephen C. Tweedie
Hi, On Fri, May 18, 2001 at 09:55:14AM +0200, Rogier Wolff wrote: The boot quickly was an example. Load netscape quickly on some systems is done by dd-ing the binary to /dev/null. This is one of the reasons why some filesystems use extent maps instead of inode indirection trees. The

Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Sat, May 19, 2001 at 07:36:07PM -0700, Linus Torvalds wrote: Right now we don't try to aggressively drop streaming pages, but it's possible. Using raw devices is a silly work-around that should not be needed, and this load shows a real problem in current Linux (one soon to be fixed,

Re: DVD blockdevice buffers

2001-05-23 Thread Stephen C. Tweedie
Hi, On Wed, May 23, 2001 at 11:12:00AM -0700, Linus Torvalds wrote: On Wed, 23 May 2001, Stephen C. Tweedie wrote: No, you can actually do all the prepare_write()/commit_write() stuff that the filesystems already do. And you can do it a lot _better_ than the current buffer-cache-based

Re: O_TRUNC problem on a full filesystem

2001-05-24 Thread Stephen C. Tweedie
On Wed, May 23, 2001 at 07:55:48PM +1000, Andrew Morton wrote: When you truncated your file, the blocks remained preallocated on behalf of the file, and were hence considered used. For some reason, a subsequent attempt to allocate blocks for the same file failed to use that file's

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 11:54:55AM -0500, Oliver Xymoron wrote: That's probably the right thing to add. I'd vote for an async flag instead. Why??? Why change the default behaviour to be something much slower? I was suggesting an async flag _in addition_ to the sync flag,

Re: Ext2, fsync() and MTA's?

2001-05-22 Thread Stephen C. Tweedie
Hi, On Tue, May 22, 2001 at 10:50:51AM -0500, Oliver Xymoron wrote: On Mon, 21 May 2001, Theodore Tso wrote: On Mon, May 21, 2001 at 06:47:58PM +0100, Stephen C. Tweedie wrote: Just set chattr +S on the spool dir. That's what the flag is for. The biggest problem

Re: O_TRUNC problem on a full filesystem

2001-05-24 Thread Stephen C. Tweedie
Hi, On Thu, May 24, 2001 at 11:24:10AM -0600, Andreas Dilger wrote: How have you done the ext3 preallocation code? Preallocation is currently disabled in ext3. Eventually I'll probably get it going by adding a journal prepare-commit callback to allow the filesystem to flush preallocation

Re: O_TRUNC problem on a full filesystem

2001-05-25 Thread Stephen C. Tweedie
Hi, On Fri, May 25, 2001 at 10:24:49AM +1000, Andrew Morton wrote: For example, when we miss the goal block we search forward up to 63 blocks for a *single* free block, and use that. Perhaps we shouldn't? The reasoning here is that it's much cheaper to go to a single block which is very

Re: DVD blockdevice buffers

2001-05-25 Thread Stephen C. Tweedie
Hi, On Fri, May 25, 2001 at 09:09:37AM -0600, Eric W. Biederman wrote: The case we don't get quite right are partial reads that hit cached data, on a page that doesn't have PG_Uptodate set. We don't actually need to do the I/O on the surrounding page to satisfy the read request. But we do

Re: DVD blockdevice buffers

2001-05-25 Thread Stephen C. Tweedie
Hi, On Fri, May 25, 2001 at 02:24:52PM -0400, Alexander Viro wrote: If you are OK with adding two extra arguments to -readpage() I could submit a patch replacing that with plain and simple page cache by tomorrow. It should not be a problem to port, but I want to get some sleep before

Re: ext3 message if FS is not ext3

2001-05-28 Thread Stephen C. Tweedie
Hi, On Sat, May 26, 2001 at 10:54:39AM +0100, Steve Dodd wrote: On Wed, May 23, 2001 at 01:06:16PM +0100, Stephen C. Tweedie wrote: On Wed, May 23, 2001 at 02:00:13PM +0200, Florian Lohoff wrote: i think this message should be removed ;) [..] VFS: Can't find an ext3 filesystem on dev

Re: 2T for i386 OT

2000-09-04 Thread Stephen C. Tweedie
Hi, On Sun, Sep 03, 2000 at 11:36:25PM +0200, Andrea Ferraris wrote: I used to think that. Im planning on deploying a 1Tb IDE raid using 3ware kit for an ftp site very soon. Its very cheap and its very fast. UDMA with one disk per channel and the controller doing some of the work.

Re: thread rant

2000-09-04 Thread Stephen C. Tweedie
Hi, On Sat, Sep 02, 2000 at 09:41:03PM +0200, Ingo Molnar wrote: On Sat, 2 Sep 2000, Alexander Viro wrote: unlink() and the last munmap()/exit() will get rid of it... yep - and this isnt possible with traditional SysV shared memory, and isnt possible with traditional SysV semaphores.

Re: zero-copy TCP

2000-09-04 Thread Stephen C. Tweedie
Hi, On Sun, Sep 03, 2000 at 07:29:56PM +0200, Ingo Molnar wrote: On Sun, 3 Sep 2000, Andi Kleen wrote: I did the same for fragment RX some months ago (simple fragment lists that were copy-checksummed to user space). Overall it is probably better to use a kiovec, because that can be

Two VM problems for the 2.4 TODO list

2000-09-04 Thread Stephen C. Tweedie
Hi Ted, To be fixed for 2.4: 1) Non-atomic pte updates The page aging code and mprotect both modify existing ptes non-atomically. That can stomp on the VM hardware on other CPUs setting the dirty bit on mmaped pages when using threads. 2.2 is vulnerable too. 2) RSS locking

Re: [patch] All the fs patches resulting from updating mark_buffer_dirty

2000-09-04 Thread Stephen C. Tweedie
Hi, On Mon, Sep 04, 2000 at 11:29:56PM +0200, Rasmus Andersen wrote: I have changed the interface to mark_buffer_dirty (as per Tigran Aivazian's suggestion). This impacts a lot of places in the kernel (trivially), noticeably the file systems. The URL below points a big patch for all

Re: [PATCH] Useless inode semaphore locking in 2.4.0-test8

2000-09-19 Thread Stephen C. Tweedie
Hi, On Fri, Sep 15, 2000 at 08:31:43AM -0400, Alexander Viro wrote: Also truncate inode locking is needed to get a halfway reliable loopback device (unlike the current one) ? I'm afraid that I've lost you here - what do you mean? loop does a bmap() and then submits block IO. You don't

Re: [patch] vmfixes-2.4.0-test9-B2

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 04:02:30AM +0200, Andrea Arcangeli wrote: On Sun, Sep 24, 2000 at 09:27:39PM -0400, Alexander Viro wrote: So help testing the patches to them. Arrgh... I think I'd better fix the bugs that I know about before testing patches that tries to remove the

Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 12:36:50AM +0200, bert hubert wrote: On Mon, Sep 25, 2000 at 12:13:42AM +0200, Andrea Arcangeli wrote: On Sun, Sep 24, 2000 at 10:43:03PM +0100, Stephen C. Tweedie wrote: any form of serialisation on the quota file). This feels like rather a lot of new

Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 06:05:00PM +0200, Andrea Arcangeli wrote: On Mon, Sep 25, 2000 at 04:42:49PM +0100, Stephen C. Tweedie wrote: Progress is made, clean pages are discarded and dirty ones queued for How can you make progress if there isn't swap avaiable and all the freeable page

Re: refill_inactive()

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 09:17:54AM -0700, Linus Torvalds wrote: On Mon, 25 Sep 2000, Rik van Riel wrote: Hmmm, doesn't GFP_BUFFER simply imply that we cannot allocate new buffer heads to do IO with?? No. New buffer heads would be ok - recursion is fine in theory, as long as

Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 07:03:47PM +0200, Andrea Arcangeli wrote: This really seems to be the biggest difference between the two approaches right now. The FreeBSD folks believe fervently that one of [ aging cache and mapped pages in the same cycle ] Right. And since you move

Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 08:09:31PM +0100, Alan Cox wrote: Indeed. But we wont fail the kmalloc with a NULL return Isn't that the preferred behaviour, though? If we are completely out of VM on a no-swap machine, we should be killing one of the existing processes rather than

Re: [patch] vmfixes-2.4.0-test9-B2 - fixing deadlocks

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 09:32:42PM +0200, Andrea Arcangeli wrote: Having shrink_mmap that browse the mapped page cache is useless as having shrink_mmap browsing kernel memory and anonymous pages as it does in 2.2.x as far I can tell. It's an algorithm complexity problem and it will

Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 02:04:19PM -0600, [EMAIL PROTECTED] wrote: Right, but if the alternative is spurious ENOMEM when we can satisfy An ENOMEM is not spurious if there is not enough memory. UNIX does not ask the OS to do impossible tricks. Yes, but the ENOMEM _is_ spurious if you

Re: the new VMt

2000-09-26 Thread Stephen C. Tweedie
Hi, On Tue, Sep 26, 2000 at 09:17:44AM -0600, [EMAIL PROTECTED] wrote: Operating systems cannot make more memory appear by magic. The question is really about the best strategy for dealing with low memory. In my opinion, the OS should not try to out-think physical limitations. Instead, the

Re: Can ext3 or ReiserFS w/ journalling be made on /dev/loop?

2000-10-03 Thread Stephen C. Tweedie
Hi, On Thu, Sep 28, 2000 at 07:59:21PM +, Marc Mutz wrote: I was asked a question lately that I was unable to answer: Assume you want to make a (encrypted, but that's not the issue here) filesystem on a loopback block device (/dev/loop*). Can this be a journalling one? In other words,

Re: Soft-Updates for Linux ?

2000-10-03 Thread Stephen C. Tweedie
Hi, On Mon, Oct 02, 2000 at 03:13:07AM +0200, Daniel Phillips wrote: What I've seen proposed is a mechanism where the VM can say 'flush this page' to a filesystem and the filesystem can then go ahead and do what it wants, including flushing the page, flushing some other page, or not doing

Re: the new VMt

2000-09-26 Thread Stephen C. Tweedie
Hi, On Tue, Sep 26, 2000 at 11:02:48AM -0600, Erik Andersen wrote: Another approach would be to let user space turn off overcommit. No. Overcommit only applies to pageable memory. Beancounter is really needed for non-pageable resources such as page tables and mlock()ed pages. Cheers,

Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS forLinux

2000-09-07 Thread Stephen C. Tweedie
Hi, On Wed, Sep 06, 2000 at 09:44:54AM -0600, Jeff V. Merkey wrote: KDB is a user mode debugger designed to debug user space apps that's been hacked to run with a driver. Absolutely not true. You're probably thinking about kgdb, the gdb stub for remote kernel source level debugging. kdb

Re: the new VMt

2000-09-25 Thread Stephen C. Tweedie
Hi, On Mon, Sep 25, 2000 at 12:13:15PM -0600, [EMAIL PROTECTED] wrote: Definitely not. GFP_ATOMIC is reserved for things that really can't swap or schedule right now. Use GFP_ATOMIC indiscriminately and you'll have to increase the number of atomic-allocatable pages. Process 1,2 and 3

Re: Quota fixes and a few questions

2000-10-06 Thread Stephen C. Tweedie
Hi Jan, On Wed, Sep 27, 2000 at 02:56:20PM +0200, Jan Kara wrote: So I've been thinking about fixes in quota (and also writing some parts). While we're at it, I've attached a patch which I was sent which simply teaches quota about ext3 as a valid fs type in fstab. It appears to work fine

Re: Quota fixes and a few questions

2000-10-20 Thread Stephen C. Tweedie
Hi, On Thu, Oct 19, 2000 at 07:03:54PM +0200, Jan Kara wrote: I stumbled into another problem: When using ext3 with quotas the kjournald process stops responding and stays in DW state when the filesystem gets under heavy load. It is easy to reproduce: Just extract two or three larger

Re: Quota fixes and a few questions

2000-10-24 Thread Stephen C. Tweedie
Hi, On Fri, Oct 20, 2000 at 05:02:28PM +0200, Juri Haberland wrote: As I wrote in my original mail I used 0.0.2f. Is there a version called 0.0.3 yet and if so where can I find it? In ftp.uk.linux.org (which is currently not reachable as well as vger.kernel.org) I found only 0.0.2f. I must

Re: ext3 fsck question

2001-03-01 Thread Stephen C. Tweedie
Hi, On Wed, Feb 28, 2001 at 08:03:21PM -0600, Neal Gieselman wrote: I applied the libs and other utilites from e2fsprogs by hand. I ran fsck.ext3 on my secondary partition and it ran fine. The boot fsck on / was complaining about something but I could not catch it. I then went single user

Re: Writing on raw device with software RAID 0 is slow

2001-03-01 Thread Stephen C. Tweedie
Hi, On Thu, Mar 01, 2001 at 10:44:38AM -0500, Ben LaHaise wrote: On Thu, 1 Mar 2001, Stephen C. Tweedie wrote: Raw IO is always synchronous: it gets flushed to disk before the write returns. You don't get any write-behind with raw IO, so the smaller the blocksize you write

Re: Writing on raw device with software RAID 0 is slow

2001-03-01 Thread Stephen C. Tweedie
Hi, On Thu, Mar 01, 2001 at 11:08:13AM -0500, Ben LaHaise wrote: On Thu, 1 Mar 2001, Stephen C. Tweedie wrote: Actually, how about making it a sysctl? That's probably the most reasonable approach for now since the optimal size depends on hardware. Fine with me. --Stephen - To unsubscribe

Re: [patch] set kiobuf io_count once, instead of increment

2001-03-02 Thread Stephen C. Tweedie
On Tue, Feb 27, 2001 at 04:22:22PM -0800, Robert Read wrote: Currently in brw_kiovec, iobuf-io_count is being incremented as each bh is submitted, and decremented in the bh-b_end_io(). This means io_count can go to zero before all the bhs have been submitted, especially during a large

Re: [patch] set kiobuf io_count once, instead of increment

2001-03-02 Thread Stephen C. Tweedie
Hi, On Wed, Feb 28, 2001 at 09:18:59AM -0800, Robert Read wrote: On Tue, Feb 27, 2001 at 10:50:54PM -0300, Marcelo Tosatti wrote: This is true, but it looks like the brw_kiovec allocation failure handling is broken already; it's calling __put_unused_buffer_head on bhs without waiting for

Raw IO fixes for 2.4.2-ac8

2001-03-02 Thread Stephen C. Tweedie
Hi, I've just uploaded the current raw IO fixes as kiobuf-2.4.2-ac8-A0.tar.gz on ftp.uk.linux.org:/pub/linux/sct/fs/raw-io/ and ftp.*.kernel.org:/pub/linux/kernel/people/sct/raw-io/ This includes: 00-movecode.diff: move kiobuf code from mm/memory.c to fs/iobuf.c

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote: On Tue, 6 Mar 2001, Alan Cox wrote: You want a write barrier. Write buffering (at least for short intervals) in the drive is very sensible. The kernel needs to able to send drivers a write barrier which will not be

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 03:12:41PM +0100, Jens Axboe wrote: Yep, it's much harder than it seems. Especially because for the barrier to be really useful, having inter-request dependencies becomes a requirement. So you can say something like 'flush X and Y, but don't flush Y before X is

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote: On Wed, Mar 07 2001, Stephen C. Tweedie wrote: My bigger concern is when the journalled fs has a log on a different queue. For most fs'es, that's not an issue. The fs won't start writeback on the primary disk at all until

Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote: On Wed, Mar 07 2001, Stephen C. Tweedie wrote: For most fs'es, that's not an issue. The fs won't start writeback on the primary disk at all until the journal commit has been acknowledged as firm on disk. But do you

Re: scsi vs ide performance on fsync's

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 10:36:38AM -0800, Linus Torvalds wrote: On Wed, 7 Mar 2001, Jeremy Hansen wrote: So in the meantime as this gets worked out on a lower level, we've decided to take the fsync() out of berkeley db for mysql transaction logs and mount the filesystem -o sync.

Re: 64-bit capable block device layer

2001-03-08 Thread Stephen C. Tweedie
Hi, On Wed, Mar 07, 2001 at 07:53:23PM +0100, Jens Axboe wrote: OTOH, I'm not sure what problems it could give to make this a compile-time option... Plus compile time options are nasty :-). It would probably make bigger sense to completely skip all the merging etc for low end

Re: BUG? race between kswapd and ptrace (access_process_vm )

2001-03-12 Thread Stephen C. Tweedie
Hi, On Thu, Mar 08, 2001 at 09:12:52PM +0100, Manfred Spraul wrote: Fixing the bug is more difficult than I thought: Initially I assumed it would be a two-liner (lock, unlock) but kmap() can sleep. Can I reuse a kmap_atomic() type or should I add a new type? I've just tried with the

Re: magic device renumbering was -- Re: Linux 2.4.2ac20

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 02:11:57PM -0500, Lars Kellogg-Stedman wrote: Put LABEL=label set with e2label in you fstab in place of the device name. Which is great, for filesystems that support labels. Unfortunately, this isn't universally available -- for instance, you cannot mount a

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 09:24:59AM -0300, Rik van Riel wrote: On Wed, 14 Mar 2001, Rik van Riel wrote: The mmap_sem is used in procfs to prevent the list of VMAs from changing. In the page fault code it seems to be used to prevent other page faults to happen at the same time with the

Re: O_DSYNC flag for open

2001-03-16 Thread Stephen C. Tweedie
Hi, On Wed, Mar 14, 2001 at 10:26:42PM -0500, Tom Vier wrote: fdatasync() is the same as fsync(), in linux. No, in 2.4 fdatasync does the right thing and skips the inode flush if only the timestamps have changed. until fdatasync() is implimented (ie, syncs the data only) fdatasync is

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-16 Thread Stephen C. Tweedie
Hi, On Fri, Mar 16, 2001 at 08:50:25AM -0300, Rik van Riel wrote: On Fri, 16 Mar 2001, Stephen C. Tweedie wrote: Write locks would be used in the code where we actually want to change the VMA list and page faults would use an extra lock to protect against each other (possibly a per

Re: [PATCH]: Only one memory zone for sparc64

2001-03-16 Thread Stephen C. Tweedie
Hi, On Thu, Mar 15, 2001 at 07:13:52PM +1100, Anton Blanchard wrote: On sparc64 we dont care about the different memory zones and iterating through them all over the place only serves to waste CPU. I suspect this would be the case with some other architectures but for the moment I have

Re: changing mm-mmap_sem (was: Re: system call for process information?)

2001-03-19 Thread Stephen C. Tweedie
Hi, On Sun, Mar 18, 2001 at 10:34:38AM +0100, Manfred Spraul wrote: The problem is that mmap_sem seems to be protecting the list of VMAs, so taking _only_ the page_table_lock could let a VMA change under us while a page fault is underway ... No, that can't happen. It can. Page faults

Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, There is what appears to be a simple thinko in kswapd. We really ought to keep kswapd running as long as there is either a free space or an inactive page shortfall; but right now we only keep going if _both_ are short. Diff below. With this change, I've got a 64MB box running Applix and

Re: Thinko in kswapd?

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 09:36:48AM -0800, Linus Torvalds wrote: On Thu, 22 Mar 2001, Stephen C. Tweedie wrote: There is what appears to be a simple thinko in kswapd. We really ought to keep kswapd running as long as there is either a free space or an inactive page shortfall

Re: 2.4.2 fs/inode.c

2001-03-22 Thread Stephen C. Tweedie
Hi, On Thu, Mar 22, 2001 at 01:42:15PM -0500, Jan Harkes wrote: I found some code that seems wrong and didn't even match it's comment. Patch is against 2.4.2, but should go cleanly against 2.4.3-pre6 as well. Patch looks fine to me. Have you tested it? If this goes wrong, things break

[patch] O_SYNC patch 2/3, add per-inode dirty buffer lists

2000-11-22 Thread Stephen C. Tweedie
Hi, This is the second part of my old O_SYNC diffs patched up for 2.4.0-test11. It adds support for per-inode dirty buffer lists. In 2.4, we are now generating dirty buffers on a per-page basis for every write. For large O_SYNC writes (often databases use around 128K per write), we obviously

[patch] O_SYNC patch 3/3, add inode dirty buffer list support to ext2

2000-11-22 Thread Stephen C. Tweedie
Hi, This final part of the O_SYNC patches adds calls to ext2, and to generic_commit_write, to record dirty buffers against the owning inode. It also removes most of fs/ext2/fsync.c, which now simply calls the generic sync code. --Stephen 2.4.0test11.02.ext2-osync.diff : ---

Re: e2fs performance as function of block size

2000-11-24 Thread Stephen C. Tweedie
Hi, On Wed, Nov 22, 2000 at 11:28:12PM +0100, Michael Marxmeier wrote: If the files get somewhat bigger (eg. 1G) having a bigger block size also greatly reduces the ext2 overhead. Especially fsync() used to be really bad on big file but choosing a bigger block size changed a lot. 2.4

Re: [PATCH] blindingly stupid 2.2 VM bug

2000-11-30 Thread Stephen C. Tweedie
Hi, On Tue, Nov 28, 2000 at 04:35:32PM -0800, John Kennedy wrote: On Wed, Nov 29, 2000 at 01:04:16AM +0100, Andrea Arcangeli wrote: On Tue, Nov 28, 2000 at 03:36:15PM -0800, John Kennedy wrote: No, it is all ext3fs stuff that is touching the same areas your Ok this now makes sense.

Re: corruption

2000-12-01 Thread Stephen C. Tweedie
Hi, On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote: I bet this'll catch it: static __inline__ void list_del(struct list_head *entry) { __list_del(entry-prev, entry-next); + entry-next = entry-prev = 0; } No, because the buffer hash list is never referenced

Re: Updated: raw I/O patches (v2.2)

2000-12-01 Thread Stephen C. Tweedie
Hi, On Tue, Nov 21, 2000 at 11:18:15AM -0500, Eric Lowe wrote: I have updated raw I/O patches with Andrea's and my fixes against 2.2. They check for CONFIG_BIGMEM so they can be applied and compiled without the bigmem patch. I've just posted an assembly of all of the outstanding raw IO

[patch] O_SYNC patch 1/3: Fix fdatasync

2000-11-22 Thread Stephen C. Tweedie
Hi, This is the first patch out of 3 to fix O_SYNC and fdatasync for 2.4.0-test11. The patch below fixes fdatasync (at least for ext2) so that it does not flush the inode to disk for purely timestamp updates. It splits I_DIRTY into two bits, one bit (I_DIRTY_DATASYNC) which is set only for

[testcase] fsync/O_SYNC simple test cases

2000-11-22 Thread Stephen C. Tweedie
Hi, The code below may be useful for doing simple testing of the O_SYNC and f[data]sync code in the kernel. It times various combinations of updates-in-place and appends under various synchronisation mechanisms, making it possible to see clearly whether fdatasync is skipping inode updates for

Re: [patch] O_SYNC patch 3/3, add inode dirty buffer list support to ext2

2000-11-23 Thread Stephen C. Tweedie
Hi, On Wed, Nov 22, 2000 at 11:54:24AM -0700, Jeff V. Merkey wrote: I have not implemented O_SYNC in NWFS, but it looks like I need to add it before posting the final patches. This patch appears to force write-through of only dirty inodes, and allow reads to continue from cache. Is this

Re: corruption

2000-12-04 Thread Stephen C. Tweedie
Hi, On Sat, Dec 02, 2000 at 10:33:36AM -0500, Alexander Viro wrote: On Sun, 3 Dec 2000, Andrew Morton wrote: It appears that this problem is not fixed. Sure, it isn't. Place where the shit hits the fan: fs/buffer.c::unmap_buffer(). Add the call of remove_inode_queue(bh) there and see if

  1   2   3   4   5   6   7   >