Hi,
Sorry for the delay, I've been on holiday for a couple of weeks.
On Thu, Jul 27, 2000 at 07:36:34PM -0400, Jeremy Hansen wrote:
ok ... to clarify ... ext3 _guarantees_ consistent file system metadata
or empirically, it tends to be robust about maintaining consistent
file system
Hi,
On Fri, Jul 21, 2000 at 11:54:20PM -0600, Andreas Dilger wrote:
Note that you should not make the journals so large that they are a
major fraction of your RAM, as you will not gain anything by this.
A few megabytes is fine, 1024 disk blocks is the minimum.
Yep. The main drawbacks to a
Hi,
On Thu, Jul 27, 2000 at 01:41:54PM -0400, Jeremy Hansen wrote:
We're really itching to use ext3 in a production environment. Can you
give any clues on how things are going?
The ext3-0.0.2f appears to be rock solid. Andreas has got prototyped
code for e2fsck log replay, and I've got
Hi,
On Wed, Jul 26, 2000 at 02:05:11PM -0400, Alexander Viro wrote:
Here is one more for you:
Suppose we grow the last fragment/tail/whatever. Do you copy the
data out of that shared block? If so, how do you update buffer_heads in
pages that cover the relocated data? (Same goes for
Hi,
On Wed, Jul 26, 2000 at 02:56:01PM -0400, Alexander Viro wrote:
Not. Data normally is in page. Buffer_heads are not included into buffer
cache. They are refered from the struct page and their -b_data just
points to appropriate pieces of page. You can not get them via bread().
At all.
Hi,
On Wed, Jul 26, 2000 at 02:41:44PM -0400, Alexander Viro wrote:
For tail writes, I'd imagine we would just end up using the page cache
as a virtual cache as NFS uses it, and doing plain copy into the
buffer cache pages.
Ouch. I _really_ don't like it - we end up with special
Hi,
On Mon, Jul 24, 2000 at 04:34:11PM -0400, Jeremy Hansen wrote:
I build some customized kernel rpm's for our inhouse distro and I've
incorporated the profiling patches to use the sard utility. I'm just
curious if there is any downside to using this patch for any reason and if
there are
to the group
+ * descriptors.
+ * Stephen C. Tweedie ([EMAIL PROTECTED]), 1999
+ *
*/
#include linux/config.h
@@ -16,7 +23,6 @@
#include linux/locks.h
#include linux/quotaops.h
-
/*
* balloc.c contains the blocks allocation and deallocation routines
*/
@@ -70,42 +76,33
:14 +0100
Resent-From: "Stephen C. Tweedie" [EMAIL PROTECTED]
Resent-Message-ID: [EMAIL PROTECTED]
Resent-Date: Mon, 1 May 2000 18:18:14 +0100 (BST)
Resent-To: [EMAIL PROTECTED]
X-Authentication-Warning: worf.scot.redhat.com: sct set sender to
[EMAIL PROTECTED] using -f
MIME-Version: 1
Hi all,
The following patch fully implements O_SYNC, fsync and fdatasync,
at least for ext2. The infrastructure it includes should make it
trivial for any other filesystem to do likewise.
The basic changes are:
Include a per-inode list of dirty buffers
Pass a "datasync"
Hi,
On Fri, Jun 09, 2000 at 02:53:19PM -0700, Ulrich Drepper wrote:
If I don't preallocate the file, then even fdatasync is slow, [...]
This might be a good argument to implement posix_fallocate() in the
kernel.
No. If we do posix_fallocate(), then there are only two choices:
we either
Hi,
On Fri, Jun 09, 2000 at 02:51:18PM -0700, Ulrich Drepper wrote:
Have you thought about O_RSYNC and whether it is possible/useful to
support it separately?
It would be possible and useful, but it's entirely separate from the
write path and probably doesn't make sense until we've got
Hi,
On Sun, May 21, 2000 at 04:27:29PM +, Ton Hospel wrote:
It delivers a realtime signal to tasks which have requested it. The task
can then call fstat to find out what changed.
A poll() notification mechanism should be at least as useful for e.g.
GUI's who generally prefer to
Hi,
On Thu, Apr 20, 2000 at 10:57:15AM +0200, Benno Senoner wrote:
I tried all combinations using my hdtest.c which I posted yesterday.
I tried O_SYNC and even O_DSYNC on the SGI (Origin 2k),
(D_SYNC syncs only data blocks but not metadata blocks)
Not quite. O_DSYNC syncs metadata too.
Hi,
On Wed, Apr 19, 2000 at 11:55:04AM -0400, Karl JH Millar wrote:
I've noticed that file writes with O_SYNC are very much slower than they should
be.
How fast do you think they should be?
If you are doing small appends, then O_SYNC is _guaranteed_ to be dead
slow. Ever write involves
Hi,
On Mon, Apr 17, 2000 at 05:58:48PM -0500, Steve Lord wrote:
O_DIRECT on Linux XFS is still a work in progress, we only have
direct reads so far. A very basic implementation was made available
this weekend.
Care to elaborate on how you are doing O_DIRECT?
It's something I've been
Hi,
On Mon, Apr 17, 2000 at 07:10:43PM +0200, Martin Schenk wrote:
If you are interested in a more efficient fsync (and a real fdatasync),
I have some patches that provide better performance for very large
files (where fsync is mostly busy scanning the page cache for changes),
and a
Hi,
On Tue, Apr 18, 2000 at 10:57:25AM -0400, Paul Barton-Davis wrote:
1) pre-allocation takes a *long* time. Allocating 24 203MB files on a
clean ext2 partition of 18GB takes many, many minutes, for example.
Presumably, the same overhead is being incurred when block
allocation
Hi,
On Tue, Apr 18, 2000 at 07:56:04AM -0500, Steve Lord wrote:
XFS is using the pagebuf code we wrote (or I should say are writing - it
needs a lot of work yet). This uses kiobufs to represent data in a set of
pages. So, we have the infrastructure to take a kiobuf and read or write
it
Hi,
On Tue, Apr 18, 2000 at 01:17:52PM -0500, Steve Lord wrote:
So I guess the question here is how do you plan on keeping track of the
origin of the pages?
You don't have to.
Which ones were originally part of the kernel cache
and thus need copying up to user space?
If the caller
Hi,
On Fri, Apr 14, 2000 at 06:15:09PM +1000, Andrew Clausen wrote:
Any comments?
Yes!
Date: Fri, 14 Apr 2000 08:10:10 -0400
Message-Id: [EMAIL PROTECTED]
From: Paul Barton-Davis [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [linux-audio-dev] info point on linux hdr
Sender: [EMAIL
Hi,
On Mon, Apr 17, 2000 at 04:50:05PM +0200, Benno Senoner wrote:
Stephen, I tried all possible combinations , in my hdrbench code.
...
I tried:
-fsync() on all write descriptors at regular intervals ranging from 1sec to
10sec
- fdatasync() on all write descriptors , same as above
-
Hi,
On Mon, Apr 17, 2000 at 07:21:31PM +0200, Benno Senoner wrote:
The only way you can get much better is to do non-writeback IO
asynchronously. Use O_SYNC for writes, and submit the IOs from multiple
threads, to let the kernel schedule the multiple IOs. Use large block
sizes for
Hi,
On Mon, Apr 17, 2000 at 01:45:15PM -0400, Paul Barton-Davis wrote:
2) Why am I not having any of these problems ? Unlike Benno's code, I
Seagate 4.5GB Cheetah U2W 10K rpm IBM 9GB UltraStar U2W 10K rpm
Quantum 4.5GB Viking U2W 7.5K rpm 3 x IBM 18GB
Hi,
On Mon, Apr 17, 2000 at 01:05:12PM -0400, Paul Barton-Davis wrote:
2) Why am I not having any of these problems ? Unlike Benno's code, I
Seagate 4.5GB Cheetah U2W 10K rpm IBM 9GB UltraStar U2W 10K rpm
Quantum 4.5GB Viking U2W 7.5K rpm 3 x IBM 18GB UltraStar
Ahh
Hi,
On Thu, 10 Feb 2000 10:27:29 -0500 (EST), Alexander Viro
[EMAIL PROTECTED] said:
Correct, but that's going to make design much more complex - you really
don't want to do it for anything other than sub-page stuff (probably even
sub-sector). Which leads to 3 levels - allocation block/IO
Hi,
On Wed, 9 Feb 2000 14:30:13 -0500 (EST), Alexander Viro
[EMAIL PROTECTED] said:
On Wed, 9 Feb 2000 [EMAIL PROTECTED] wrote:
with 2k blocks and 128 byte fragments, we get to really reduce wasted
space below any other system i've ever experienced.
Erm... I'm afraid that you are missing
Hi,
Benno Senoner writes:
wow, really good idea to journal to a RAID1 array !
do you think it is possible to to the following:
- N disks holding a soft RAID5 array.
- reserve a small partition on at least 2 disks of the array to hold a RAID1
array.
- keep the journal on this
Hi,
Chris Wedgwood writes:
This may affect data which was not being written at the time of the
crash. Only raid 5 is affected.
Long term -- if you journal to something outside the RAID5 array (ie.
to raid-1 protected log disks) then you should be safe against this
type of
Hi,
On Wed, 12 Jan 2000 22:09:35 +0100, Benno Senoner [EMAIL PROTECTED]
said:
Sorry for my ignorance I got a little confused by this post:
Ingo said we are 100% journal-safe, you said the contrary,
Raid resync is safe in the presence of journaling. Journaling is not
safe in the presence of
Hi,
On Tue, 11 Jan 2000 16:41:55 -0600, "Mark Ferrell"
[EMAIL PROTECTED] said:
Perhaps I am confused. How is it that a power outage while attached
to the UPS becomes "unpredictable"?
One of the most common ways to get an outage while on a UPS is somebody
tripping over, or otherwise
Hi,
This is a FAQ: I've answered it several times, but in different places,
so here's a definitive answer which will be my last one: future
questions will be directed to the list archives. :-)
On Tue, 11 Jan 2000 16:20:35 +0100, Benno Senoner [EMAIL PROTECTED]
said:
then raid can miscalculate
Hi,
On Tue, 11 Jan 2000 20:17:22 +0100, Benno Senoner [EMAIL PROTECTED]
said:
Assume all RAID code - FS interaction problems get fixed, since a
linux soft-RAID5 box has no battery backup, does this mean that we
will loose data ONLY if there is a power failure AND successive disk
failure ?
Hi,
On Fri, 07 Jan 2000 00:32:48 +0300, Hans Reiser [EMAIL PROTECTED] said:
Andrea Arcangeli wrote:
BTW, I thought Hans was talking about places that can't sleep (because of
some not schedule-aware lock) when he said "place that cannot call
balance_dirty()".
You were correct. I think
Hi,
On Thu, 23 Dec 1999 02:37:48 +0300, Hans Reiser [EMAIL PROTECTED]
said:
I completly agree to change mark_buffer_dirty() to call balance_dirty()
before returning.
How can we use a mark_buffer_dirty that calls balance_dirty in a
place where we cannot call balance_dirty?
It shouldn't
Hi,
On Wed, 22 Dec 1999 11:08:37 -0800, "sadri" [EMAIL PROTECTED] said:
Is there an archive of the emails posted in this list(linux-fsdevel)?
thanks
Searching for "linux-fsdevel archive" on www.google.com found several.
--Stephen
Hi,
On Tue, 21 Dec 1999 14:57:29 +0100 (CET), Andrea Arcangeli
[EMAIL PROTECTED] said:
So you are talking about replacing this line:
dirty = size_buffers_type[BUF_DIRTY] PAGE_SHIFT;
with:
dirty = (size_buffers_type[BUF_DIRTY]+size_buffers_type[BUF_PINNED])
PAGE_SHIFT;
Hi,
All comments welcome: this is a first draft outline of what I _think_
Linus is asking for from journaling for mainline kernels.
On Wed, 15 Dec 1999 13:45:22 -0500, Chris Mason
[EMAIL PROTECTED] said:
What is your current plan for porting ext3 into 2.3/2.4? Are you still
going to be
Hi,
On Wed, 8 Dec 1999 17:28:49 -0500, "Theodore Y. Ts'o" [EMAIL PROTECTED]
said:
Never fear, there will be an very easy way to switch back and forth
between ext2 and ext3. A single mount command, or at most a single
tune2fs command, should be all that it takes, no matter how the
journal
Hi,
On Sat, 4 Dec 1999 08:44:46 -0800 (PST), Brion Vibber
[EMAIL PROTECTED] said:
Maybe at least stick a nice big warning in the docs along the lines of
"do not write to your journal file while mounted with journaling on,
you big dummy!" :) Not that I'd do so deliberately of course, but it
Hi,
On Sat, 4 Dec 1999 12:11:58 -0700, mike burrell [EMAIL PROTECTED] said:
couldn't you just make a new flag for the inode that journal.dat uses? i'm
guessing using S_IMMUTABLE will cause some problems, but something similar
to that?
The immutable flag will work fine: journaling bypasses
Hi,
On Wed, 3 Nov 1999 10:30:36 +0100 (MET), Ingo Molnar
[EMAIL PROTECTED] said:
OK... but raid resync _will_ block forever as it currently stands.
{not forever, but until the transaction is committed. (it's not even
necessary for the RAID resync to wait for locked buffers, it could as well
Hi,
On Mon, 1 Nov 1999 13:04:23 -0500 (EST), Ingo Molnar [EMAIL PROTECTED]
said:
On Mon, 1 Nov 1999, Stephen C. Tweedie wrote:
No, that's completely inappropriate: locking the buffer indefinitely
will simply cause jobs like dump() to block forever, for example.
i dont think dump should
Hi,
On Mon, 1 Nov 1999 15:03:54 -0600, Timothy Ball
[EMAIL PROTECTED] said:
I did my best to try to follow what the README for ext3 said. I made a
journal file in /var/local/journal/journal.dat. It has an inode # of
183669.
Then I did /sbin/lilo -R linux rw rootflags=journal=183669.
Hi,
On Tue, 2 Nov 1999 03:10:10 -0600, Timothy Ball
[EMAIL PROTECTED] said:
Here's the info from /var/log/dmesg. Could it be that my journal file
has a large inode number? And if you have more than one ext3 partition
can you have more than one journal file? How would you specify it...
must
Hi,
On Mon, 01 Nov 1999 15:53:29 -0500, Jeff Garzik
[EMAIL PROTECTED] said:
XFS delays allocation of user data blocks when possible to
make blocks more contiguous; holding them in the buffer cache.
This allows XFS to make extents large without requiring the user
to specify extent size, and
Hi,
On Tue, 02 Nov 1999 08:15:36 -0700, [EMAIL PROTECTED] said:
I'd like these pages to age a little before handing them over to the
"inode disk", because the "write_one_page" function called by
generic_file_write would incur significant latency if the inode disk is
"real", ie. not
Hi,
On Fri, 29 Oct 1999 14:06:24 -0400 (EDT), Ingo Molnar [EMAIL PROTECTED]
said:
On Fri, 29 Oct 1999, Stephen C. Tweedie wrote:
Fixing this in raid seems far, far preferable to fixing it in the
filesystems. The filesystem should be allowed to use the buffer cache
for metadata and should
Hi,
On Mon, 01 Nov 1999 15:58:33 -0600, [EMAIL PROTECTED] said:
I agree with this, it feels closer to the linux page cache, the
terminology in the XFS white paper is a little confusing here.
XFS on Irix caches file data in buffers, but not in the regular buffer
cache, they are cached off
Hi all,
There seems to be a conflict between journaling filesystem requirements
(both ext3 and reiserfs), and the current raid code when it comes to
write ordering in the buffer cache.
The current ext3 code adds debugging checks to ll_rw_block designed to
detect any cases where blocks are being
Hi,
On Thu, 28 Oct 1999 21:29:44 +0200, Marc Mutz [EMAIL PROTECTED] said:
Hi Stephen!
I just tried your journalling support with my old spare scsi disk
(240M). The things I tried were:
Oct 28 21:08:57 adam kernel: Journal length (768 blocks) too short.
Your journal is too short. The jfs
Hi,
On Tue, 26 Oct 1999 10:19:13 +0200, [EMAIL PROTECTED]
(Miklos Szeredi) said:
Hi,
Sorry, I forgot to say, that it was with 0.0.2b. Also I reproduced
this twice, so the second time, it _was_ a clean fs before converting
to ext3.
Are you sure you applied _both_ 0.0.2a and 0.0.2b, not just
Hi,
On Tue, 26 Oct 1999 14:56:50 +0200, [EMAIL PROTECTED]
(Miklos Szeredi) said:
I will try to make more tests with a cleaner configuration...
OK, thanks --- the more information you can provide, the better. A
reliable reproducer for any problems would be best of all.
--Stephen
Hi,
On Mon, 25 Oct 1999 18:41:09 +0200, [EMAIL PROTECTED]
(Miklos Szeredi) said:
5) boot, then mount ext3 filesystem - it says:
JFS DEBUG: (recovery.c, 411): journal_recover: JFS: recovery, exit status 0,
recovered transactions 130 to 133
6) unmount the fs, and with debugfs turn off
Hi,
On Tue, 19 Oct 1999 09:50:59 -0400, Daniel Veillard
[EMAIL PROTECTED] said:
The oops of the day :
Oct 19 05:42:50 fr kernel: Assertion failure in journal_get_write_access() at
transaction.c line 436: "handle-h_buffer_credits 0"
...
Oct 19 05:42:50 fr kernel: Call Trace:
Hi,
On 19 Oct 1999 00:44:38 -0500, [EMAIL PROTECTED] (Eric
W. Biederman) said:
Meanwhile having the metadata in the page cache (where they would
have predictable offsets by file size)
Doesn't help --- you still need to look up the physical block numbers
in order to clear the allocation
Hi,
On Fri, 15 Oct 1999 14:04:48 +, Peter Rival [EMAIL PROTECTED]
said:
Well, I think I just uncovered the first, umm, detail for this
release ;) Got the following while trying to start an AIM VII fserver
run on an AlphaPC164. Disks are all 2GB narrow SCSI, hanging off of a
single
Hi,
On Sat, 16 Oct 1999 01:59:38 -0400 (EDT), Alexander Viro
[EMAIL PROTECTED] said:
a) to d), fine.
e) we might get out with just a dirty blocks lists, but I think
that we can do better than that: keep per-inode cache for metadata. It
is going to be separate from the data pagecache.
Hi,
On Mon, 18 Oct 1999 14:30:10 +0200 (CEST), Andrea Arcangeli
[EMAIL PROTECTED] said:
I can't see these bigmem issues. The buffer and page-cache memory is not
in bigmem anyway. And you can use bigmem _wherever_ you want as far as you
remeber to fix all the involved code to kmap before
Hi,
On Mon, 18 Oct 1999 13:26:45 -0400 (EDT), Alexander Viro
[EMAIL PROTECTED] said:
You can't even know which is the inode Y that is using a block X without
reading all the inode metadata while the block X still belongs to the
inode Y (before the truncate).
WTF would we _need_ to know?
Hi,
On Mon, 18 Oct 1999 13:26:45 -0400 (EDT), Alexander Viro
[EMAIL PROTECTED] said:
You can't even know which is the inode Y that is using a block X without
reading all the inode metadata while the block X still belongs to the
inode Y (before the truncate).
WTF would we _need_ to know?
Hi,
On 18 Oct 1999 08:20:51 -0500, [EMAIL PROTECTED] (Eric W. Biederman)
said:
And I still can't see how you can find the stale buffer in a
per-object queue as the object can be destroyed as well after the
lowlevel truncate.
Yes but you can prevent the buffer from becomming a stale buffer
Hi,
On Wed, 13 Oct 1999 02:19:19 +0400, Hans Reiser [EMAIL PROTECTED] said:
I merely hypothesize that the maximum value of required
FLUSHTIME_NON_EXPANDING will usually be less than 1% of memory, and
therefor won't have an impact. It is not like keeping 1% of memory
around for use by text
Hi,
On Thu, 14 Oct 1999 14:31:23 +0400, Hans Reiser [EMAIL PROTECTED] said:
Ah, I see, the problem is that when you batch the commits they can be
truly huge, and they all have to commit for any of them to commit, and
none of them can be flushed until they all commit, is that it?
Exactly.
Hi,
On Mon, 11 Oct 1999 11:12:01 -0400 (EDT), Alexander Viro
[EMAIL PROTECTED] said:
I began screwing around the truncate() stuff and the following is
a status report/request for comments:
a) call of -truncate() method (and vmtruncate()) had been moved
into the notify_change().
Hi,
On Sat, 9 Oct 1999 23:53:01 +0200 (CEST), Andrea Arcangeli
[EMAIL PROTECTED] said:
What I said about bforget in my old email is still true. The _only_ reason
for using bforget instead of brelse is to get buffer performances (that in
2.3.x are not so interesting as in 2.2.x as in 2.3.x
Hi,
On 11 Oct 1999 17:58:54 -0500, [EMAIL PROTECTED] (Eric W. Biederman)
said:
What about adding to the end of ext2_alloc_block:
bh = get_hash_table(inode-i_dev, result, inode-i_sb-s_blocksize);
/* something is playing with our fresh block, make them stop. ;-) */
if (bh) {
if
Hi,
On Tue, 12 Oct 1999 15:39:35 +0200 (CEST), Andrea Arcangeli
[EMAIL PROTECTED] said:
On Tue, 12 Oct 1999, Stephen C. Tweedie wrote:
changes. The ext2 truncate code is really, really careful to provide
I was _not_ talking about ext2 at all. I was talking about the bforget and
brelse
Hi,
On Sun, 10 Oct 1999 16:57:18 +0200 (CEST), Andrea Arcangeli
[EMAIL PROTECTED] said:
My point was that even being forced to do a lookup before creating
each empty buffer, will be still faster than 2.2.x as in 2.3.x the hash
will contain only metadata. Less elements means faster lookups.
Hi,
On Thu, 27 May 1999 22:18:50 -0700 (PDT), Linus Torvalds
[EMAIL PROTECTED] said:
I care not one whit what the interface is on a /dev level
Fine, I can live with that!
the only thing I care about is that the internal interfaces make sense
(ie are purely based on kernel physical
Hi,
On Thu, 27 May 1999 22:15:29 -0700 (PDT), Linus Torvalds [EMAIL PROTECTED] said:
On Fri, 28 May 1999, Stephen C. Tweedie wrote:
I have a patch I've been trying out to improve fsync performance by
maintaining per-inode dirty buffer lists, and to implement fdatasync
by tracking
Hi Linus,
I have a patch I've been trying out to improve fsync performance by
maintaining per-inode dirty buffer lists, and to implement fdatasync by
tracking "significant" and "insignificant" (ie. timestamp) dirty flags
in the inode separately. However, in doing this I found a serious
problem
Hi,
On Sat, 20 Mar 1999 15:46:18 -0500 (EST), Alexander Viro
[EMAIL PROTECTED] said:
Folks, could somebody recall why the check for I_DIRTY had been
added to iput()? AFAICS it does nothing. If the inode is hashed and clean
it's already on inode_in_use, otherwise we are in *big* trouble
73 matches
Mail list logo