Ric Wheeler wrote:
>> FS waiting for completion of all the dependent writes isn't too good
>> latency and throughput-wise tho. It would be best if FS can indicate
>> dependencies between write commands and barrier so that barrier
>> doesn't have to empty the whole queue. Hmm... Can someone tell m
Eric Anopolsky wrote:
On Thu, 2008-10-23 at 01:14 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
Waiting for the target to ack an IO is not sufficient, since the target
ack does not (with write cache enabled) mean that it is on persistent
storage.
FS waiting for completion of all th
On Thu, 2008-10-23 at 01:14 +0900, Tejun Heo wrote:
> Ric Wheeler wrote:
> > Waiting for the target to ack an IO is not sufficient, since the target
> > ack does not (with write cache enabled) mean that it is on persistent
> > storage.
>
> FS waiting for completion of all the dependent writes isn'
Avi Kivity wrote:
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
Depends on your
Paul P Komkoff Jr wrote:
Replying to Steven Pratt:
Steven Pratt wrote:
RAID data is now uploaded. The config used is 136 15k rpm fiber disks
in 8 arrays all striped together with DM. These results are not as
favorable to BTRFS, as there seem to be some major issues with random
write a
Replying to Steven Pratt:
> Steven Pratt wrote:
> RAID data is now uploaded. The config used is 136 15k rpm fiber disks
> in 8 arrays all striped together with DM. These results are not as
> favorable to BTRFS, as there seem to be some major issues with random
> write and mail server worklo
jim owens wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads;
for mixed read/write workloads the throughput decrease is negligible.
Different tests on different hardware
g
Avi Kivity wrote:
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
Different tests on d
Tejun Heo wrote:
For most SATA drives, disabling write back cache seems to take high
toll on write throughput. :-(
I measured this yesterday. This is true for pure write workloads; for
mixed read/write workloads the throughput decrease is negligible.
As long as the error status is sti
Ric Wheeler wrote:
For any given set of disks, you "just" need to do the math to compute
the utilized capacity, the expected rate of drive failure, the rebuild
time and then see whether you can recover from your first failure
before a 2nd disk dies.
Spare disks have the advantage of a fully
Michel Salim wrote:
Though it would be nice to have a tool that would provide enough
information to make a warranty claim -- does btrfs keep enough
information for such a tool to be written?
Failed device I/O (rather than bad checksums and other
fs-specific error detections) should be logged a
Tejun Heo wrote:
Ric Wheeler wrote:
Waiting for the target to ack an IO is not sufficient, since the target
ack does not (with write cache enabled) mean that it is on persistent
storage.
FS waiting for completion of all the dependent writes isn't too good
latency and throughput-wise th
Ric Wheeler wrote:
> Waiting for the target to ack an IO is not sufficient, since the target
> ack does not (with write cache enabled) mean that it is on persistent
> storage.
FS waiting for completion of all the dependent writes isn't too good
latency and throughput-wise tho. It would be best if
On Wed, 2008-10-22 at 11:25 -0400, Ric Wheeler wrote:
> Avi Kivity wrote:
> > Ric Wheeler wrote:
> >>>
> >>> Well, btrfs is not about duplicating how most storage works today.
> >>> Spare capacity has significant advantages over spare disks, such as
> >>> being able to mix disk sizes, RAID level
On Wed, Oct 22, 2008 at 9:52 AM, Stephan von Krawczynski
<[EMAIL PROTECTED]> wrote:
> On Wed, 22 Oct 2008 09:15:45 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
>> On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
>> > On Tue, 21 Oct 2008 13:31:37 -0400
>> > Ric Wheeler <[EMAIL P
Avi Kivity wrote:
Chris Mason wrote:
One problem with the spare
capacity model is the general trend where drives from the same batch
that get hammered on in the same way tend to die at the same time. Some
shops will sleep better knowing there's a hot spare and that's fine by
me.
How does h
On Wed, 2008-10-22 at 10:45 -0500, Steven Pratt wrote:
> Chris Mason wrote:
> > On Wed, 2008-10-22 at 10:00 -0500, Steven Pratt wrote:
> >
> >> Steven Pratt wrote:
> >>
> >>> As discussed on the BTRFS conference call, myself and Kevin Corry have
> >>> set up some test machines for the purp
Chris Mason wrote:
On Wed, 2008-10-22 at 10:00 -0500, Steven Pratt wrote:
Steven Pratt wrote:
As discussed on the BTRFS conference call, myself and Kevin Corry have
set up some test machines for the purpose of doing performance testing
on BTRFS. The intent is to have a semi permanent
Chris Mason wrote:
One problem with the spare
capacity model is the general trend where drives from the same batch
that get hammered on in the same way tend to die at the same time. Some
shops will sleep better knowing there's a hot spare and that's fine by
me.
How does hot sparing help? A
Ric Wheeler wrote:
I think that the btrfs plan is still to push more complicated RAID
schemes off to MD (RAID6, etc) so this is an issue even with a JBOD.
It will be interesting to map out the possible ways to use built in
mirroring, etc vs the external RAID and actually measure the utilized
c
Avi Kivity wrote:
Ric Wheeler wrote:
Well, btrfs is not about duplicating how most storage works today.
Spare capacity has significant advantages over spare disks, such as
being able to mix disk sizes, RAID levels, and better performance.
Sure, there are advantages that go in favour of one
On Wed, 2008-10-22 at 10:00 -0500, Steven Pratt wrote:
> Steven Pratt wrote:
> > As discussed on the BTRFS conference call, myself and Kevin Corry have
> > set up some test machines for the purpose of doing performance testing
> > on BTRFS. The intent is to have a semi permanent setup that we ca
Ric Wheeler wrote:
Well, btrfs is not about duplicating how most storage works today.
Spare capacity has significant advantages over spare disks, such as
being able to mix disk sizes, RAID levels, and better performance.
Sure, there are advantages that go in favour of one or the other
appr
Avi Kivity wrote:
Ric Wheeler wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
That is a different model (and one that makes sense, we used that in
Centera for object level
Steven Pratt wrote:
As discussed on the BTRFS conference call, myself and Kevin Corry have
set up some test machines for the purpose of doing performance testing
on BTRFS. The intent is to have a semi permanent setup that we can
use to test new features and code drops in BTRFS as well as to do
Ric Wheeler wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
That is a different model (and one that makes sense, we used that in
Centera for object level protection schemes)
Avi Kivity wrote:
Ric Wheeler wrote:
One key is not to replace the drives too early - you often can
recover significant amounts of data from a drive that is on its last
legs. This can be useful even in RAID rebuilds since with today's
enormous drive capacities, you might hit a latent error dur
Ric Wheeler wrote:
Matthias Wächter wrote:
On 10/22/2008 3:50 PM, Chris Mason wrote:
Let me reword my answer ;). The next write will always succeed unless
the drive is out of remapping sectors. If the drive is out, it is only
good for reads and holding down paper on your desk.
I hav
Chris Mason wrote:
You want to have spare capacity, enough for one or two (or fifteen)
drives' worth of data. When a drive goes bad, you rebuild into the
spare capacity you have.
You want spare capacity that does not degrade your raid levels if you
move the data onto it. In some confi
On Wed, 2008-10-22 at 16:32 +0200, Avi Kivity wrote:
> Ric Wheeler wrote:
> > One key is not to replace the drives too early - you often can recover
> > significant amounts of data from a drive that is on its last legs.
> > This can be useful even in RAID rebuilds since with today's enormous
>
concerning this discussion, I'd like to put up some "requests" which
strongly oppose to those brought up initially:
- if you run into an error in the fs structure or any IO error that prevents
you from bringing the fs into a consistent state, please simply oops. If a
user feels that availabili
Matthias Wächter wrote:
On 10/22/2008 3:50 PM, Chris Mason wrote:
Let me reword my answer ;). The next write will always succeed unless
the drive is out of remapping sectors. If the drive is out, it is only
good for reads and holding down paper on your desk.
I have a fairly new SATA
Ric Wheeler wrote:
One key is not to replace the drives too early - you often can recover
significant amounts of data from a drive that is on its last legs.
This can be useful even in RAID rebuilds since with today's enormous
drive capacities, you might hit a latent error during the rebuild on
jim owens wrote:
Avi Kivity wrote:
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only
Chris Mason wrote:
On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote:
Chris Mason wrote:
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able t
Avi Kivity wrote:
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only a problem if there
On Wed, 2008-10-22 at 08:53 -0500, Steven Pratt wrote:
> Chris Mason wrote:
> > On Tue, Oct 21, 2008 at 05:20:03PM -0500, Steven Pratt wrote:
> >
> >> As discussed on the BTRFS conference call, myself and Kevin Corry have
> >> set up some test machines for the purpose of doing performance test
On 10/22/2008 3:50 PM, Chris Mason wrote:
> Let me reword my answer ;). The next write will always succeed unless
> the drive is out of remapping sectors. If the drive is out, it is only
> good for reads and holding down paper on your desk.
I have a fairly new SATA disk with about 3000 hours of
On Wed, 22 Oct 2008 05:48:30 -0700
"Jeff Schroeder" <[EMAIL PROTECTED]> wrote:
> > NFS is a good example for a fs that never got redesigned for modern world. I
> > hope it will, but currently it's like Model T on a highway.
> > You have a NFS server with clients. Your NFS server dies, your backup
On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote:
> Chris Mason wrote:
> > On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
> >
> >> Ric Wheeler wrote:
> >>
> >>> I think that we do handle a failure in the case that you outline above
> >>> since the FS will be able to notice the erro
Chris Mason wrote:
On Tue, Oct 21, 2008 at 05:20:03PM -0500, Steven Pratt wrote:
As discussed on the BTRFS conference call, myself and Kevin Corry have
set up some test machines for the purpose of doing performance testing
on BTRFS. The intent is to have a semi permanent setup that we can
On Wed, 22 Oct 2008 09:15:45 -0400
Chris Mason <[EMAIL PROTECTED]> wrote:
> On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
> > On Tue, 21 Oct 2008 13:31:37 -0400
> > Ric Wheeler <[EMAIL PROTECTED]> wrote:
> >
> > > [...]
> > > If you have remapped a big chunk of the sectors (sa
On Wed, 2008-10-22 at 14:19 +0200, Stephan von Krawczynski wrote:
> On Tue, 21 Oct 2008 13:49:43 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
> > On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
> >
> > > > > 2. general requirements
> > > > > - fs errors without file/dir
Chris Mason wrote:
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able to notice the error before it sends a commit
down (and that commit is wrapped in the barrier flush c
Chris Mason wrote:
On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
On Tue, 21 Oct 2008 13:31:37 -0400
Ric Wheeler <[EMAIL PROTECTED]> wrote:
[...]
If you have remapped a big chunk of the sectors (say more than 10%), you
should grab the data off the disk asap and repl
Tejun Heo wrote:
Ric Wheeler wrote:
I think that we do handle a failure in the case that you outline above
since the FS will be able to notice the error before it sends a commit
down (and that commit is wrapped in the barrier flush calls). This is
the easy case since we still have the context
On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote:
> Ric Wheeler wrote:
> > I think that we do handle a failure in the case that you outline above
> > since the FS will be able to notice the error before it sends a commit
> > down (and that commit is wrapped in the barrier flush calls). This is
>
Ric Wheeler wrote:
Scrubbing is key for many scenarios since errors can "grow" even in
places where previous IO has been completed without flagging an error.
Some neat tricks are:
(1) use block level scrubbing to detect any media errors. If you
can map that sector level error into a file s
Ric Wheeler wrote:
> I think that we do handle a failure in the case that you outline above
> since the FS will be able to notice the error before it sends a commit
> down (and that commit is wrapped in the barrier flush calls). This is
> the easy case since we still have the context for the IO.
I
On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote:
> On Tue, 21 Oct 2008 13:31:37 -0400
> Ric Wheeler <[EMAIL PROTECTED]> wrote:
>
> > [...]
> > If you have remapped a big chunk of the sectors (say more than 10%), you
> > should grab the data off the disk asap and replace it. Worry
On Wed, 2008-10-22 at 09:03 -0400, Ric Wheeler wrote:
> Avi Kivity wrote:
> > Stephan von Krawczynski wrote:
> >>
> >>>- filesystem autodetects, isolates, and (possibly) repairs errors
> >>>- online "scan, check, repair filesystem" tool initiated by admin
> >>>- Reliability so high that
Avi Kivity wrote:
Stephan von Krawczynski wrote:
- filesystem autodetects, isolates, and (possibly) repairs errors
- online "scan, check, repair filesystem" tool initiated by admin
- Reliability so high that they never run that check-and-fix tool
That is _wrong_ (to a certain e
Tejun Heo wrote:
Ric Wheeler wrote:
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the devi
Tejun Heo wrote:
Ric Wheeler wrote:
The cache flush command for ATA devices will block and wait until all of
the device's write cache has been written back.
What I assume Tejun was referring to here is that some IO might have
been written out to the device and an error happened when the devi
Hello,
This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch
of little locks. There is now a pinned_mutex, which is used when messing with
the pinned_extents extent io tree, and the extent_ins_mutex which is used with
the pending_del and extent_ins extent io trees. The l
On Wed, Oct 22, 2008 at 5:19 AM, Stephan von Krawczynski
<[EMAIL PROTECTED]> wrote:
> On Tue, 21 Oct 2008 13:49:43 -0400
> Chris Mason <[EMAIL PROTECTED]> wrote:
>
>> On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
>>
>> > > > 2. general requirements
>> > > > - fs errors witho
On Tue, 21 Oct 2008 13:31:37 -0400
Ric Wheeler <[EMAIL PROTECTED]> wrote:
> [...]
> If you have remapped a big chunk of the sectors (say more than 10%), you
> should grab the data off the disk asap and replace it. Worry less about
> errors during read, writes indicate more serious errors.
Ok, n
On Tue, 21 Oct 2008 13:49:43 -0400
Chris Mason <[EMAIL PROTECTED]> wrote:
> On Tue, 2008-10-21 at 18:27 +0200, Stephan von Krawczynski wrote:
>
> > > > 2. general requirements
> > > > - fs errors without file/dir names are useless
> > > > - errors in parts of the fs are no reason for a fs
Stephan von Krawczynski wrote:
- filesystem autodetects, isolates, and (possibly) repairs errors
- online "scan, check, repair filesystem" tool initiated by admin
- Reliability so high that they never run that check-and-fix tool
That is _wrong_ (to a certain extent). You _want t
On Tue, 21 Oct 2008 18:59:26 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> Stephan von Krawczynski <[EMAIL PROTECTED]> writes:
> >
> > Yes, we hear and say that all the time, name one linux fs doing it, please.
>
> ext[234] support it to some extent. It has some limitations
> (especially when the
On Tue, 21 Oct 2008 18:09:40 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:
> While that's true today, I'm not sure it has to be true always.
> I always thought traditional fsck user interfaces were a
> UI desaster and could be done much better with some simple tweaks.
> [...]
You are completely ri
On Tue, 21 Oct 2008 13:15:13 -0400
Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> On Tue, Oct 21, 2008 at 07:01:36PM +0200, Stephan von Krawczynski wrote:
> > Sure, but what you say only reflects the ideal world. On a file service, you
> > never have that. In fact you do not even have good control
On Tue, 21 Oct 2008 11:34:20 -0400
jim owens <[EMAIL PROTECTED]> wrote:
> Hearing what user's think they want is always good, but...
>
> Stephan von Krawczynski wrote:
> >
> > thanks for your feedback. Understand "minimum requirement" as "minimum
> > requirement to drop the current installation
Ric Wheeler wrote:
> The cache flush command for ATA devices will block and wait until all of
> the device's write cache has been written back.
>
> What I assume Tejun was referring to here is that some IO might have
> been written out to the device and an error happened when the device
> tried to
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote:
Eric Anopolsky wrote:
On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote:
- power loss at any time must not corrupt the fs (atomic fs modification)
(new-data loss is acceptable)
jim owens wrote:
Remember that the device bandwidth is the limiter so even
when each host has a dedicated path to the device (as in
dual port SAS or FC), that 2nd host cuts the throughput by
more than 1/2 with uncoordinated seeks and transfers.
That's only a problem if there is a single shared
66 matches
Mail list logo