Vault '19 Submission Deadline Approaching

2018-10-25 Thread Ric Wheeler
:06 -0700 (PDT) From: Ric Wheeler Reply-To: Ric Wheeler To: Ric Wheeler Vault '19 Talk and Workshop Proposals Are Due in Three Weeks…. Vault '19, February 25–26, 2019, Boston, MA, USA <http://s.usenix.org/acton/ct/2452/s-02cb-1810/Bct/l-sf-cl-7011M

[LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Ric Wheeler
One proposal for btrfs was that we should look at getting discard out of the synchronous path in order to minimize the slowdown associated with enabling discard at mount time. Seems like an obvious win for "hint" like operations like discard. I do wonder where we stand now with the cost of the

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-17 Thread Ric Wheeler
On 2/17/19 4:09 PM, Dave Chinner wrote: On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote: One proposal for btrfs was that we should look at getting discard out of the synchronous path in order to minimize the slowdown associated with enabling discard at mount time. Seems like an

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-18 Thread Ric Wheeler
On 2/17/19 9:22 PM, Dave Chinner wrote: On Sun, Feb 17, 2019 at 06:42:59PM -0500, Ric Wheeler wrote: On 2/17/19 4:09 PM, Dave Chinner wrote: On Sun, Feb 17, 2019 at 03:36:10PM -0500, Ric Wheeler wrote: One proposal for btrfs was that we should look at getting discard out of the synchronous

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-27 Thread Ric Wheeler
On 2/22/19 11:45 AM, Keith Busch wrote: On Thu, Feb 21, 2019 at 09:51:12PM -0500, Martin K. Petersen wrote: Keith, With respect to fs block sizes, one thing making discards suck is that many high capacity SSDs' physical page sizes are larger than the fs block size, and a sub-page discard is wo

Linux Foundation's open source file & storage conference CFP

2017-01-05 Thread Ric Wheeler
Hi all, The CFP for the Linux Foundation's Vault conference is coming close to an end. The event is being held this year in Cambridge, Massachusetts on the days following the LSF/MM summit. The first two year's events have been solid, focused events in my (slightly biased) opinion, so worth

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-16 Thread Ric Wheeler
On 05/15/2013 04:03 PM, Zach Brown wrote: On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: Why introduce a new syscall instead of extending sys_splice? Personally, I think it's ugly to have different operations use the same syscall just because their arguments match. I agree with Za

Linux Plumbers IO & File System Micro-conference

2013-07-12 Thread Ric Wheeler
Linux Plumbers has approved a file and storage microconference. The overview page is here: http://wiki.linuxplumbersconf.org/2013:file_and_storage_systems I would like to started gathering in ideas for topics. I have been approached already with a request to cover the copy_range work Zach Br

Re: Linux Plumbers IO & File System Micro-conference

2013-07-15 Thread Ric Wheeler
developers? thanks! Ric -Original Message- From: linux-fsdevel-ow...@vger.kernel.org [mailto:linux-fsdevel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler Sent: Friday, July 12, 2013 1:21 PM To: linux-s...@vger.kernel.org; Linux FS Devel; linux-...@vger.kernel.org; linux-btrfs; xfs-oss

Re: [PATCH 0/4] btrfs: offline dedupe v3

2013-07-26 Thread Ric Wheeler
On 07/26/2013 12:30 PM, Mark Fasheh wrote: Hi, The following series of patches implements in btrfs an ioctl to do offline deduplication of file extents. To be clear, "offline" in this sense means that the file system is mounted and running, but the dedupe is not done during file writes, but aft

Re: SSD optimizations

2010-12-13 Thread Ric Wheeler
On 12/13/2010 01:20 PM, Tomasz Torcz wrote: On Mon, Dec 13, 2010 at 05:17:51PM +, Paddy Steed wrote: So, no-one has any idea's on how to implement the cache. Would making it all swap work, does to OS cache files in swap? Quite the opposite. Too many people have ideas for SSD-as-cache in

Re: Offline Deduplication for Btrfs

2011-01-10 Thread Ric Wheeler
I think that dedup has a variety of use cases that are all very dependent on your workload. The approach you have here seems to be a quite reasonable one. I did not see it in the code, but it is great to be able to collect statistics on how effective your hash is and any counters for the extr

Re: [PATCH 1/6] fs: add hole punching to fallocate

2011-01-28 Thread Ric Wheeler
On 01/12/2011 07:44 AM, Dave Chinner wrote: On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote: On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner wrote: The historical reason for such behaviour existing in XFS was that in 1997 the CPU and IO latency cost of unwritten extent convers

Re: [PATCH v2 0/6] btrfs: scrub

2011-03-11 Thread Ric Wheeler
On 03/11/2011 09:49 AM, Arne Jansen wrote: This series adds an initial implementation for scrub. It works quite straightforward. The usermode issues an ioctl for each device in the fs. For each device, it enumerates the allocated device chunks. For each chunk, the contained extents are enumerated

Re: BTRFS fsck tool

2011-03-12 Thread Ric Wheeler
On 03/12/2011 05:49 PM, Spelic wrote: On 03/10/2011 02:02 PM, Chris Mason wrote: Cutting the power isn't problem unless you're using something where cache flushes are not supported. Some disks lie about cache flush having completed. This is really not true for modern enterprise class drives.

Re: Btrfs wins Linux New Media Award

2011-03-23 Thread Ric Wheeler
On 03/23/2011 02:17 PM, Chris Mason wrote: Hi everyone, During the last Cebit conference, Btrfs was presented with an award for the most innovative open source project. I'd like to thank everyone at Linux magazine involved with selecting us, and since we have so many contributors I wanted to sh

Re: [GIT PULL] Btrfs

2013-09-13 Thread Ric Wheeler
On 09/12/2013 11:36 AM, Chris Mason wrote: Mark Fasheh's offline dedup work is also here. In this case offline means the FS is mounted and active, but the dedup work is not done inline during file IO. This is a building block where utilities are able to ask the FS to dedup a series of extents

Re: Triple parity and beyond

2013-11-19 Thread Ric Wheeler
On 11/19/2013 12:28 PM, Andrea Mazzoleni wrote: Hi Peter, Yes, 251 disks for 6 parity. To build a NxM Cauchy matrix you need to pick N+M distinct values in the GF(2^8) and we have only 2^8 == 256 available. This means that for every row we add for an extra parity level, we have to remove one of

[Lsf-pc] [CFP] Linux Storage, Filesystem and Memory Management Summit 2013

2013-01-18 Thread Ric Wheeler
se to improve this year's, please also send that to: lsf...@lists.linux-foundation.org Thank you on behalf of the program committee: Storage: Martin K. Petersen Jens Axboe James Bottomley Filesystems: Ric Wheeler Christoph Hellwig Jaeg

Ext4 (and xfs and btrfs?) meetings around LSF

2014-01-15 Thread Ric Wheeler
On 01/13/2014 12:45 AM, Jan Kara wrote: Hello, On Sat 11-01-14 13:05:20, Ted Tso wrote: On Fri, Jan 10, 2014 at 12:58:34PM +0100, Jan Kara wrote: do you plan to organize an ext4 meeting around LSF? And do you plan it during Collab summit or on Sunday before LSF? Thanks for answer in adva

Re: Mis-Design of Btrfs?

2011-06-29 Thread Ric Wheeler
On 06/27/2011 07:46 AM, NeilBrown wrote: On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius wrote: Good morning devs, I'm wondering whether the raid- and volume-management-builtin of btrfs is actually a sane idea or not. Currently we do have md/device-mapper support for raid already, btrfs

Re: TRIM support

2011-07-11 Thread Ric Wheeler
On 07/11/2011 06:53 AM, Chris Samuel wrote: On Mon, 11 Jul 2011 07:59:54 AM Fajar A. Nugraha wrote: Sandforce-based SSDs have their own way of reducing writes (e.g. by using internal compression), so you don't have to do anything special Not just compression, but also block level de-duplicatio

Re: Mis-Design of Btrfs?

2011-07-13 Thread Ric Wheeler
On 07/14/2011 06:56 AM, NeilBrown wrote: On Wed, 29 Jun 2011 10:29:53 +0100 Ric Wheeler wrote: On 06/27/2011 07:46 AM, NeilBrown wrote: On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius wrote: Good morning devs, I'm wondering whether the raid- and volume-management-builtin of

Re: Mis-Design of Btrfs?

2011-07-13 Thread Ric Wheeler
On 07/14/2011 07:38 AM, NeilBrown wrote: On Thu, 14 Jul 2011 07:02:22 +0100 Ric Wheeler wrote: I'm certainly open to suggestions and collaboration. Do you have in mind any particular way to make the interface richer?? NeilBrown Hi Neil, I know that Chris has a very specific set o

Re: Mis-Design of Btrfs?

2011-07-15 Thread Ric Wheeler
07/14/2011 07:38 AM, NeilBrown wrote: On Thu, 14 Jul 2011 07:02:22 +0100 Ric Wheeler wrote: I'm certainly open to suggestions and collaboration. Do you have in mind any particular way to make the interface richer?? NeilBrown Hi Neil, I know that Chris has a very specific set of use case

Re: Mis-Design of Btrfs?

2011-07-15 Thread Ric Wheeler
On 07/15/2011 02:20 PM, Chris Mason wrote: Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400: On 07/15/2011 12:34 PM, Chris Mason wrote: [ triggering IO retries on failed crc or other checks ] But, maybe the whole btrfs model is backwards for a generic layer. Instead of sending

Re: Mis-Design of Btrfs?

2011-07-15 Thread Ric Wheeler
On 07/15/2011 05:23 PM, da...@lang.hm wrote: On Fri, 15 Jul 2011, Chris Mason wrote: Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400: On 07/15/2011 12:34 PM, Chris Mason wrote: By bubble up I mean that if you have multiple layers capable of doing retries, the lowest levels w

Re: Mis-Design of Btrfs?

2011-07-15 Thread Ric Wheeler
On 07/15/2011 06:01 PM, da...@lang.hm wrote: On Fri, 15 Jul 2011, Ric Wheeler wrote: On 07/15/2011 05:23 PM, da...@lang.hm wrote: On Fri, 15 Jul 2011, Chris Mason wrote: Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400: On 07/15/2011 12:34 PM, Chris Mason wrote: By b

btrfs panic - BUG: soft lockup - CPU#0 stuck for 61s! [fs_mark:4573]

2008-06-02 Thread Ric Wheeler
I can reliably get btrfs to panic by running my fs_mark code on a newly created file system with lots of threads on an 8-way box. If this is too aggressive, let me know ;-) Here is a summary of the panic: EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. device

Re: btrfs panic - BUG: soft lockup - CPU#0 stuck for 61s! [fs_mark:4573]

2008-06-04 Thread Ric Wheeler
Chris Mason wrote: On Mon, Jun 02, 2008 at 01:52:47PM -0400, Ric Wheeler wrote: I can reliably get btrfs to panic by running my fs_mark code on a newly created file system with lots of threads on an 8-way box. If this is too aggressive, let me know ;-) Here is a summary of the panic

Re: btrfs panic - BUG: soft lockup - CPU#0 stuck for 61s! [fs_mark:4573]

2008-06-05 Thread Ric Wheeler
Chris Mason wrote: On Mon, Jun 02, 2008 at 01:52:47PM -0400, Ric Wheeler wrote: I can reliably get btrfs to panic by running my fs_mark code on a newly created file system with lots of threads on an 8-way box. If this is too aggressive, let me know ;-) Here is a summary of the panic

Re: btrfs panic - BUG: soft lockup - CPU#0 stuck for 61s! [fs_mark:4573]

2008-06-09 Thread Ric Wheeler
Chris Mason wrote: On Thu, 05 Jun 2008 13:43:48 -0400 Ric Wheeler <[EMAIL PROTECTED]> wrote: Chris Mason wrote: On Mon, Jun 02, 2008 at 01:52:47PM -0400, Ric Wheeler wrote: I can reliably get btrfs to panic by running my fs_mark code on a newly created file system wit

Re: FileSystem level compression support.

2008-06-11 Thread Ric Wheeler
Miguel Sousa Filipe wrote: Hi there, Providing compression and/or encryption at the file system layer has proven to be contentious and polemic, at least in the linux kernel world. However, I would like to probe you guys (specially Chis Mason) about your thoughts on the issue. I have seen quite

btrfs & SSD testing

2008-07-16 Thread Ric Wheeler
On the call, we talked briefly about trying to get some access to high speed SSD's for btrfs testing. I did not see online prices for STEC which is one high end part we mentioned, but I did see that Rocketdisk has 2.5 & 3.5 inch S-ATA drives that claim 19,000 IOPS. The affordable ones (say $

Re: New data=ordered code pushed out to btrfs-unstable

2008-07-18 Thread Ric Wheeler
Chris Mason wrote: Hello everyone, It took me much longer to chase down races in my new data=ordered code, but I think I've finally got it, and have pushed it out to the unstable trees. There are no disk format changes included. I need to make minor mods to the resizing and balancing code, but

Re: New data=ordered code pushed out to btrfs-unstable

2008-07-18 Thread Ric Wheeler
Chris Mason wrote: On Fri, 2008-07-18 at 16:09 -0400, Ric Wheeler wrote: Just to kick the tires, I tried the same test that I ran last week on ext4. Everything was going great, I decided to kill it after 6 million files or so and restart. The unmount has taken a very, very long time - seems

Re: New data=ordered code pushed out to btrfs-unstable

2008-07-20 Thread Ric Wheeler
Chris Mason wrote: On Fri, 2008-07-18 at 18:35 -0400, Ric Wheeler wrote: Chris Mason wrote: On Fri, 2008-07-18 at 16:09 -0400, Ric Wheeler wrote: Just to kick the tires, I tried the same test that I ran last week on ext4. Everything was going great, I decided to kill it after

Re: New data=ordered code pushed out to btrfs-unstable

2008-07-20 Thread Ric Wheeler
Chris Mason wrote: On Sun, 2008-07-20 at 08:19 -0400, Ric Wheeler wrote: Chris Mason wrote: On Fri, 2008-07-18 at 18:35 -0400, Ric Wheeler wrote: Chris Mason wrote: On Fri, 2008-07-18 at 16:09 -0400, Ric Wheeler wrote: Just to kick the tires

Re: New data=ordered code pushed out to btrfs-unstable

2008-07-21 Thread Ric Wheeler
Chris Mason wrote: On Mon, 2008-07-21 at 14:29 -0400, Ric Wheeler wrote: Chris Mason wrote: On Sun, 2008-07-20 at 09:46 -0400, Ric Wheeler wrote: Just to kick the tires, I tried the same test that I ran last week on ext4. Everything was going great

Re: single disk reed solomon codes

2008-08-04 Thread Ric Wheeler
Ahmed Kamal wrote: An experiment of applying RS codes for protecting data, worth a look http://ttsiodras.googlepages.com/rsbep.html He overwrites a series of 127 sectors and still manages to correctly recover his data. We all know disks give us unreadable sectors every now and then, so at least

Re: btrfs_tree_lock & trylock

2008-09-08 Thread Ric Wheeler
Christoph Hellwig wrote: On Mon, Sep 08, 2008 at 09:49:42AM -0700, Stephen Hemminger wrote: Not to mention the problem that developers seem to have faster machines than average user, but slower than the enterprise and future generation CPU's. So any tuning value seems to get out of date fast.

Re: Hang running fs_mark

2008-09-25 Thread Ric Wheeler
Josef Bacik wrote: On Thu, Sep 25, 2008 at 02:37:02PM -0400, Chris Mason wrote: On Thu, 2008-09-25 at 13:56 -0400, Josef Bacik wrote: Hello, Reporting this on behalf of ric. He was running the following fs_mark command ./fs_mark -d /mnt/test -s 20480 -D 64 -t 8 -F Seems it hung and

Re: Hang running fs_mark

2008-09-25 Thread Ric Wheeler
Chris Mason wrote: On Thu, 2008-09-25 at 14:34 -0400, Josef Bacik wrote: On Thu, Sep 25, 2008 at 02:37:02PM -0400, Chris Mason wrote: On Thu, 2008-09-25 at 13:56 -0400, Josef Bacik wrote: Hello, Reporting this on behalf of ric. He was running the following fs_mark command ./f

Re: Hang running fs_mark

2008-09-25 Thread Ric Wheeler
Chris Mason wrote: On Thu, 2008-09-25 at 16:37 -0400, Ric Wheeler wrote: Chris Mason wrote: On Thu, 2008-09-25 at 14:34 -0400, Josef Bacik wrote: On Thu, Sep 25, 2008 at 02:37:02PM -0400, Chris Mason wrote: On Thu, 2008-09-25 at 13:56 -0400, Josef Bacik wrote

Re: Hang running fs_mark

2008-09-25 Thread Ric Wheeler
Chris Mason wrote: On Thu, 2008-09-25 at 17:10 -0400, Ric Wheeler wrote: Ok, I have that fs_mark test running here. How far did yours get before it stopped? -chris I had gone (in heavy fsync mode) up to about 8 million files on a 1TB s-ata disk: 17 8064000

Re: Hang running fs_mark

2008-09-25 Thread Ric Wheeler
Chris Mason wrote: On Thu, 2008-09-25 at 18:58 -0400, Ric Wheeler wrote: I'm at 6.9 million files so far on a 500GB disk, and not surprisingly, I get 155 files/sec ;) My hope is that we're spinning around due to bad accounting on the reserved extents, and that Yan's latest

Re: Some very basic questions

2008-10-21 Thread Ric Wheeler
Christoph Hellwig wrote: On Tue, Oct 21, 2008 at 07:01:36PM +0200, Stephan von Krawczynski wrote: Sure, but what you say only reflects the ideal world. On a file service, you never have that. In fact you do not even have good control about what is going on. Lets say you have a setup that crea

Re: Some very basic questions

2008-10-21 Thread Ric Wheeler
Eric Anopolsky wrote: On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote: - power loss at any time must not corrupt the fs (atomic fs modification) (new-data loss is acceptable) Done. Btrfs already uses barriers as required for sata drives. Aren't there situations

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Eric Anopolsky wrote: On Tue, 2008-10-21 at 18:18 -0400, Ric Wheeler wrote: Eric Anopolsky wrote: On Tue, 2008-10-21 at 09:59 -0400, Chris Mason wrote: - power loss at any time must not corrupt the fs (atomic fs modification) (new-data loss is acceptable

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Tejun Heo wrote: Ric Wheeler wrote: The cache flush command for ATA devices will block and wait until all of the device's write cache has been written back. What I assume Tejun was referring to here is that some IO might have been written out to the device and an error happened whe

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Tejun Heo wrote: Ric Wheeler wrote: The cache flush command for ATA devices will block and wait until all of the device's write cache has been written back. What I assume Tejun was referring to here is that some IO might have been written out to the device and an error happened whe

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Stephan von Krawczynski wrote: - filesystem autodetects, isolates, and (possibly) repairs errors - online "scan, check, repair filesystem" tool initiated by admin - Reliability so high that they never run that check-and-fix tool That is _wrong_ (to a certain e

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Tejun Heo wrote: Ric Wheeler wrote: I think that we do handle a failure in the case that you outline above since the FS will be able to notice the error before it sends a commit down (and that commit is wrapped in the barrier flush calls). This is the easy case since we still have the

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Chris Mason wrote: On Wed, 2008-10-22 at 14:27 +0200, Stephan von Krawczynski wrote: On Tue, 21 Oct 2008 13:31:37 -0400 Ric Wheeler <[EMAIL PROTECTED]> wrote: [...] If you have remapped a big chunk of the sectors (say more than 10%), you should grab the data off the disk as

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Chris Mason wrote: On Wed, 2008-10-22 at 09:38 -0400, Ric Wheeler wrote: Chris Mason wrote: On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote: Ric Wheeler wrote: I think that we do handle a failure in the case that you outline above since the FS will be able

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Matthias Wächter wrote: On 10/22/2008 3:50 PM, Chris Mason wrote: Let me reword my answer ;). The next write will always succeed unless the drive is out of remapping sectors. If the drive is out, it is only good for reads and holding down paper on your desk. I have a fairly new SATA

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Chris Mason wrote: On Wed, 2008-10-22 at 22:15 +0900, Tejun Heo wrote: Ric Wheeler wrote: I think that we do handle a failure in the case that you outline above since the FS will be able to notice the error before it sends a commit down (and that commit is wrapped in the barrier flush

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Ric Wheeler wrote: One key is not to replace the drives too early - you often can recover significant amounts of data from a drive that is on its last legs. This can be useful even in RAID rebuilds since with today's enormous drive capacities, you might hit a latent

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Ric Wheeler wrote: You want to have spare capacity, enough for one or two (or fifteen) drives' worth of data. When a drive goes bad, you rebuild into the spare capacity you have. That is a different model (and one that makes sense, we used that in Centera for object

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Ric Wheeler wrote: Well, btrfs is not about duplicating how most storage works today. Spare capacity has significant advantages over spare disks, such as being able to mix disk sizes, RAID levels, and better performance. Sure, there are advantages that go in favour of

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Chris Mason wrote: One problem with the spare capacity model is the general trend where drives from the same batch that get hammered on in the same way tend to die at the same time. Some shops will sleep better knowing there's a hot spare and that's fine by me. How does h

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Tejun Heo wrote: Ric Wheeler wrote: Waiting for the target to ack an IO is not sufficient, since the target ack does not (with write cache enabled) mean that it is on persistent storage. FS waiting for completion of all the dependent writes isn't too good latency and throughput

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Avi Kivity wrote: Tejun Heo wrote: For most SATA drives, disabling write back cache seems to take high toll on write throughput. :-( I measured this yesterday. This is true for pure write workloads; for mixed read/write workloads the throughput decrease is negligible. Depends on your

Re: Some very basic questions

2008-10-22 Thread Ric Wheeler
Eric Anopolsky wrote: On Thu, 2008-10-23 at 01:14 +0900, Tejun Heo wrote: Ric Wheeler wrote: Waiting for the target to ack an IO is not sufficient, since the target ack does not (with write cache enabled) mean that it is on persistent storage. FS waiting for completion of all

Re: btrfs for enterprise raid arrays

2009-04-03 Thread Ric Wheeler
Erwin van Londen wrote: Dear all, While going through the archived mailing list and crawling along the wiki I didn't find any clues if there would be any optimizations in Btrfs to make efficient use of functions and features that today exist on enterprise class storage arrays. One exception

Re: btrfs for enterprise raid arrays

2009-04-03 Thread Ric Wheeler
Sander wrote: Dear Erwin, Erwin van Londen wrote (ao): Another thing is that some arrays have the capability to "thin-provision" volumes. In the back-end on the physical layer the array configures, let say, a 1 TB volume and virtually provisions 5TB to the host. On writes it dynamically allo

Re: btrfs for enterprise raid arrays

2009-04-03 Thread Ric Wheeler
David Woodhouse wrote: On Fri, 2009-04-03 at 12:43 +0100, Ric Wheeler wrote: New firmware/microcode versions are able to reclaim that space if it sees a certain number of consecutive zero's and will reclaim that space to the volume pool. Are there any thoughts on writing a low-pri

Re: Btrfs development plans

2009-04-20 Thread Ric Wheeler
Chris Mason wrote: Hello everyone, Just a quick note about the recently announced purchase of Sun by Oracle. This does not change Oracle's plans for Btrfs at all, and Btrfs is still a key project for us. Please, keep your btrfs contributions and testing coming ;) -chris Just to chime in o

Re: Data Deduplication with the help of an online filesystem check

2009-05-04 Thread Ric Wheeler
Thomas Glanzmann wrote: Hello Ric, (1) Block level or file level dedup? what is the difference between the two? (2) Inband dedup (during a write) or background dedup? I think inband dedup is way to intensive on ressources (memory) and also would kill every performance benchmark. So I thin

Re: Data Deduplication with the help of an online filesystem check

2009-05-04 Thread Ric Wheeler
On 05/04/2009 10:39 AM, Tomasz Chmielewski wrote: Ric Wheeler schrieb: One thing in the above scheme that would be really interesting for all possible hash functions is maintaining good stats on hash collisions, effectiveness of the hash, etc. There has been a lot of press about MD5 hash

Re: Data Deduplication with the help of an online filesystem check

2009-05-04 Thread Ric Wheeler
On 04/28/2009 01:41 PM, Michael Tharp wrote: Thomas Glanzmann wrote: no, I just used the md5 checksum. And even if I have a hash escalation which is highly unlikely it still gives a good house number. I'd start with a crc32 and/or MD5 to find candidate blocks, then do a bytewise comparison be

Re: [PATCH 1/4] md: Factor out RAID6 algorithms into lib/

2009-07-17 Thread Ric Wheeler
On 07/16/2009 01:38 PM, H. Peter Anvin wrote: Dan Williams wrote: On Mon, Jul 13, 2009 at 7:11 AM, David Woodhouse wrote: We'll want to use these in btrfs too. Signed-off-by: David Woodhouse Do you suspect that btrfs will also want to perform these operations asynchronously? I am preparing

Re: [PATCH 1/4] md: Factor out RAID6 algorithms into lib/

2009-07-17 Thread Ric Wheeler
On 07/17/2009 11:20 AM, H. Peter Anvin wrote: Ric Wheeler wrote: Worth sharing a pointer to a really neat set of papers that describe open source friendly RAID6 and erasure encoding algorithms that were presented last year and this at FAST: http://www.cs.utk.edu/~plank/plank/papers

Re: [PATCH 1/4] md: Factor out RAID6 algorithms into lib/

2009-07-17 Thread Ric Wheeler
On 07/17/2009 11:40 AM, H. Peter Anvin wrote: Ric Wheeler wrote: I have seen the papers; I'm not sure it really makes that much difference. One of the things that bugs me about these papers is that he compares to *his* implementation of my optimizations, but not to my code. In real

Re: [PATCH 1/4] md: Factor out RAID6 algorithms into lib/

2009-07-17 Thread Ric Wheeler
On 07/17/2009 11:49 AM, H. Peter Anvin wrote: Ric Wheeler wrote: The bottom line is pretty much this: the cost of changing the encoding would appear to outweigh the benefit. I'm not trying to claim the Linux RAID-6 implementation is optimal, but it is simple and appears to be fast enough

Re: [PATCH 1/4] md: Factor out RAID6 algorithms into lib/

2009-07-18 Thread Ric Wheeler
On 07/18/2009 07:53 AM, David Woodhouse wrote: On Fri, 2009-07-17 at 11:49 -0400, H. Peter Anvin wrote: Ric Wheeler wrote: The bottom line is pretty much this: the cost of changing the encoding would appear to outweigh the benefit. I'm not trying to claim the Linux R

Re: BTRFS file clone support for cp

2009-07-30 Thread Ric Wheeler
On 07/30/2009 04:40 AM, Pádraig Brady wrote: Jim Meyering wrote: Joel Becker wrote: On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote: At the moment we have these linking options: cp -l, --link #for hardlinks cp -s, --symbolic-link #for symlinks So perhaps we s

Re: grub-0.97: btrfs multidevice support [PATCH]

2009-09-25 Thread Ric Wheeler
On 09/25/2009 07:09 AM, Robert Millan wrote: On Fri, Sep 25, 2009 at 08:38:10AM +1000, Bron Gondwana wrote: On Thu, Sep 24, 2009 at 10:21:46PM +0200, Robert Millan wrote: Hi Edward, I'm sorry but GRUB Legacy is not maintained. At least not by us; we've deprecated it in favour of GRUB 2. It

Re: Benchmarking btrfs on HW Raid ... BAD

2009-09-30 Thread Ric Wheeler
On 09/28/2009 05:39 AM, Tobias Oetiker wrote: Hi Daniel, Today Daniel J Blueman wrote: On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer wrote: * Tobias Oetiker: Running this on a single disk, I get the quite acceptable results. When running on-top of a Areca HW Raid6 (lvm

Re: yum upgrade on btrfs very slow

2009-10-05 Thread Ric Wheeler
On 10/01/2009 05:04 AM, Jens Axboe wrote: On Wed, Sep 30 2009, Tomasz Torcz wrote: I wouldn't expect barriers to work here (reminder, this is PATA drive on ICH7 sata controller), but I will test tomorrow with nobarrier. Then I probably check his "yum upgrade" under seekwatcher on friday.

Linux Storage and Filesystems Summit 8-9 August

2010-02-07 Thread Ric Wheeler
This year we'll hold the Linux Storage and Filesystems summit jointly with the VM summit on the two days before LinuxCon in Boston (that's Sunday and Monday) at the Renaissance Hotel: http://events.linuxfoundation.org/events/linuxcon We're planning to hold some sessions jointly and split into t

Re: Content based storage

2010-03-20 Thread Ric Wheeler
On 03/19/2010 10:46 PM, Boyd Waters wrote: 2010/3/17 Hubert Kario: Read further, Sun did provide a way to enable the compare step by using "verify" instead of "on": zfs set dedup=verify I have tested ZFS deduplication on the same data set that I'm using to test btrfs. I used a 5-eleme

Re: Content based storage

2010-03-20 Thread Ric Wheeler
On 03/20/2010 05:24 PM, Boyd Waters wrote: On Mar 20, 2010, at 9:05 AM, Ric Wheeler wrote: My dataset reported a dedup factor of 1.28 for about 4TB, meaning that almost a third of the dataset was duplicated. It is always interesting to compare this to the rate you would get with old

Re: Content based storage

2010-03-20 Thread Ric Wheeler
On 03/20/2010 06:16 PM, Ric Wheeler wrote: On 03/20/2010 05:24 PM, Boyd Waters wrote: On Mar 20, 2010, at 9:05 AM, Ric Wheeler wrote: My dataset reported a dedup factor of 1.28 for about 4TB, meaning that almost a third of the dataset was duplicated. It is always interesting to compare

Re: Rename+crash behaviour of btrfs - nearly ext3!

2010-05-17 Thread Ric Wheeler
On 05/17/2010 02:04 PM, Jakob Unterwurzacher wrote: Hi! Following Ubuntu's dpkg+ext4 problems I wanted to see if btrfs would solve them all. And it nearly does! Now I wonder if the remaining 0.2 seconds window of exposing 0-size files could be closed too. Nearly does not seem that reassuri

Re: Rename+crash behaviour of btrfs - nearly ext3!

2010-05-18 Thread Ric Wheeler
On 05/18/2010 09:13 AM, Chris Mason wrote: On Tue, May 18, 2010 at 02:03:49PM +0200, Jakob Unterwurzacher wrote: On 18/05/10 02:59, Chris Mason wrote: Ok, I upgraded to 2.6.34 final and switched to defconfig. I only did the rename test ( i.e. no overwrite ), the window is now 1.1s, bo

Re: Balancing leaves when walking from top to down (was Btrfs:...)

2010-06-18 Thread Ric Wheeler
On 06/18/2010 06:04 PM, Edward Shishkin wrote: Chris Mason wrote: On Fri, Jun 18, 2010 at 09:29:40PM +0200, Edward Shishkin wrote: Jamie Lokier wrote: Edward Shishkin wrote: If you decide to base your file system on some algorithms then please use the original ones from proper academic papers

Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

2010-06-25 Thread Ric Wheeler
On 06/24/2010 06:06 PM, Daniel Taylor wrote: -Original Message- From: mikefe...@gmail.com [mailto:mikefe...@gmail.com] On Behalf Of Mike Fedyk Sent: Wednesday, June 23, 2010 9:51 PM To: Daniel Taylor Cc: Daniel J Blueman; Mat; LKML; linux-fsde...@vger.kernel.org; Chris Mason; Ric

Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

2010-06-26 Thread Ric Wheeler
On 06/26/2010 01:18 AM, Michael Tokarev wrote: 25.06.2010 22:58, Ric Wheeler wrote: On 06/24/2010 06:06 PM, Daniel Taylor wrote: [] On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor wrote: Just an FYI reminder. The original test (2K files) is utterly pathological

Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

2010-06-26 Thread Ric Wheeler
On 06/26/2010 08:34 AM, Daniel Shiels wrote: 25.06.2010 22:58, Ric Wheeler wrote: On 06/24/2010 06:06 PM, Daniel Taylor wrote: [] On Wed, Jun 23, 2010 at 8:43 PM, Daniel Taylor wrote: Just an FYI reminder. The original test (2K files) is utterly

billion file testing of btrfs

2010-08-16 Thread Ric Wheeler
I decided to try btrfs on F13 (2.6.33.6-147.2.4.fc13.x86_64 kernel) with the following fs_mark command and a 1.5 TB Seagate S-ATA disk: # fs_mark -s 0 -S 0 -D 1000 -n 100 -L 1000 -d /test/ -l btrfs_log.txt btrfs starts off at a fantastic rate - roughly 3-4 times the speed of ext4: FSUse%

Re: billion file testing of btrfs

2010-08-16 Thread Ric Wheeler
On 08/16/2010 08:37 AM, Chris Mason wrote: On Mon, Aug 16, 2010 at 08:29:24AM -0400, Ric Wheeler wrote: I decided to try btrfs on F13 (2.6.33.6-147.2.4.fc13.x86_64 kernel) with the following fs_mark command and a 1.5 TB Seagate S-ATA disk: # fs_mark -s 0 -S 0 -D 1000 -n 100 -L 1000 -d

Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-29 Thread Ric Wheeler
On 09/29/2010 09:19 AM, Lennart Poettering wrote: On Tue, 28.09.10 20:08, Josef Bacik (jo...@redhat.com) wrote: On Tue, Sep 28, 2010 at 07:25:13PM -0400, Christoph Hellwig wrote: On Tue, Sep 28, 2010 at 04:53:16PM -0400, Josef Bacik wrote: This was a request from the systemd guys. They need

Re: [PATCH] Btrfs: add a disk info ioctl to get the disks attached to a filesystem

2010-09-29 Thread Ric Wheeler
On 09/29/2010 08:59 PM, Lennart Poettering wrote: On Wed, 29.09.10 16:25, Ric Wheeler (rwhee...@redhat.com) wrote: This in fact is how all current readahead implementations work, be it the fedora, the suse or ubuntu's readahead or Arjan's sreadahead. What's new is that in the