Re: [zfs-discuss] USB WD Passport 500GB zfs mirror bug
On Sun, 2009-09-13 at 11:01 -0700, Stefan Parvu wrote: > 5. Disconnecting the other disk. Problems occur: > # zpool status zones > pool: zones > state: ONLINE > status: One or more devices has experienced an unrecoverable error. > An > attempt was made to correct the error. Applications are > unaffected. > action: Determine if the device needs to be replaced, and clear the > errors > using 'zpool clear' or replace the device with 'zpool > replace'. >see: http://www.sun.com/msg/ZFS-8000-9P > scrub: resilver completed after 0h0m with 0 errors on Sun Sep 13 > 20:58:02 2009 > config: > > NAME STATE READ WRITE CKSUM > zones ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c7t0d0p0 ONLINE 0 167 0 294K resilvered > c7t0d0p0 ONLINE 0 0 0 208K resilvered > > errors: No known data errors > > > # zpool status zones > pool: zones > state: DEGRADED > status: One or more devices could not be used because the label is > missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. >see: http://www.sun.com/msg/ZFS-8000-4J > scrub: resilver completed after 0h0m with 0 errors on Sun Sep 13 > 20:58:02 2009 > config: > > NAME STATE READ WRITE CKSUM > zones DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > c7t0d0p0 ONLINE 0 167 0 294K resilvered > c7t0d0p0 FAULTED 0 113 0 corrupted data > > errors: No known data errors > > > I have disconnected c8t0d0p0 but zfs reports that c7t0d0p0 has been > faulty !? Both disks read c7t0d0p0, not c7t0d0p0 and c8t0d0p0 as you have in #1-4. Typo? -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs compression algorithm : jpeg ??
On Fri, 2009-09-04 at 13:41 -0700, Richard Elling wrote: > On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote: > > > We have groups generating terabytes a day of image data from lab > > instruments and saving them to an X4500. > > Wouldn't it be easier to compress at the application, or between the > application and the archiving file system? Preamble: I am actively doing research into image set compression, specifically jpeg2000, so this is my point of reference. I think it would be easier to compress at the application level. I would suggest getting the image from the source, then use lossless jpeg2000 compression on it, saving the result to an uncompressed ZFS pool. JPEG2000 uses arithmetic encoding to do the final compression step. Arithmetic encoding has a higher compression rate (in general) than gzip-9, lzbj or others. There is an opensource implementation of jpeg2000 called jasper[1]. Jasper is the reference implementation for jpeg2000, meaning that all other jpeg2000 programs must verify it's output to that of jasper (kinda). Saving the jpeg2000 image to an uncompressed ZFS partition will be the fastest thing. Since jpeg2000 is already compressed, trying to compress it will not yeild any storage space reduction, in fact it may _increase_ the size of the data stored on disk. Since good compression algorithms result in random data you can see why running on a compressed pool would be bad for performance. [1] http://www.ece.uvic.ca/~mdadams/jasper On a side note, if you want to know how Arithmetic encoding works, Wikipedia[2] has a real nice explanation. Suffice it to say, in theory ( Without considering implementation details ) arithmetic encoding can encode _any_ data at the rate of data_entropy*num_of_symbols + data_symbol_table. In practice this doesn't happen due to floating point overflows and some other issues. [2] http://en.wikipedia.org/wiki/Arithmetic_coding -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Books on File Systems and File System Programming
I did see this, Thanks. On Fri, 2009-08-14 at 10:51 -0400, Christine Tran wrote: > > > 2009/8/14 Louis-Frédéric Feuillette > > > I am primarily interested in the theory of how to write a > filesystem. > The kernel interface comes later when I dive into a OS > specific details. > > Have you seen this? > > http://www.letterp.com/~dbg/practical-file-system-design.pdf > > I found this an excellent read. The author begins by explaining > what's expected from an FS, he explains the design choices, some > trade-offs, how the design interfaces with the actually hardware. No > specific OS detail, no API, no performance number. Very solid > fundamentals. -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Books on File Systems and File System Programming
On Fri, 2009-08-14 at 12:34 +0200, Joerg Schilling wrote: > Louis-Frédéric Feuillette wrote: > > > I saw this question on another mailing list, and I too would like to > > know. And I have a couple questions of my own. > > > > == Paraphrased from other list == > > Does anyone have any recommendations for books on File Systems and/or > > File Systems Programming? > > == end == > > Are you interested in how to write a filesystem or in how to write the > filesystem/kernel interface part? I am primarily interested in the theory of how to write a filesystem. The kernel interface comes later when I dive into a OS specific details. -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Books on File Systems and File System Programming
I saw this question on another mailing list, and I too would like to know. And I have a couple questions of my own. == Paraphrased from other list == Does anyone have any recommendations for books on File Systems and/or File Systems Programming? == end == I have some texts listed below, but are there books/journals/periodicals that start from the kernel side of open(2), read(2), write(2), etc. and progress to disk transactions? With the advent of ZFS and other transaction based files systems it seems to me that the line between File Systems and Databases are beginning to blur ( If they haven't already been doing so for some time ). Any pointers the likes of "X from here, Y from there, Z from over yonder and squished together like Q" are also welcome. (relevant) Books I have: Understanding the Linux Kernel ( The chapters about ext2 and VFS ) Systems programming in the UNIX envirionment File Structures: An OO approach using C++ Database System concepts (More about SQL and how to implement Joins ) Thanks in advance. -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Tue, 2009-08-11 at 08:04 -0700, Richard Elling wrote: > On Aug 11, 2009, at 7:39 AM, Ed Spencer wrote: > > I suspect that if we 'rsync' one of these filesystems to a second > > server/pool that we would also see a performance increase equal to > > what > > we see on the development server. (I don't know how zfs send a receive > > work so I don't know if it would address this "Filesystem Entropy" or > > specifically reorganize the files and directories). However, when we > > created a testfs filesystem in the zfs pool on the production server, > > and copied data to it, we saw the same performance as the other > > filesystems, in the same pool. > > Directory walkers, like NetBackup or rsync, will not scale well as > the number of files increases. It doesn't matter what file system you > use, the scalability will look more-or-less similar. For millions of > files, > ZFS send/receive works much better. More details are in my paper. Is there link to this paper available? -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
On Sat, 2009-08-01 at 22:31 +0900, Jorgen Lundman wrote: > Some preliminary speed tests, not too bad for a pci32 card. > > http://lundman.net/wiki/index.php/Lraid5_iozone I don't know anything about iozone, so the following may be NULL && void. I find the results suspect. 1.2GB/s read, and 500MB/s write ! These are impressive numbers indeed. I then looked at the file sizes that iozone used... How much memory do you have? I seems like the files would be able to comfortably fit in memory. I think this test needs to be re-run with Large files (ie >2*Memory size ) for them to give more accurate data. Unrelated, what did you use to generate those graphs, they look good. Also, do you have a hardware list on your site somewhere that I missed? I'd like to know more about the hardware. -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSDs get faster and less expensive
On Tue, 2009-07-21 at 14:45 -0700, Richard Elling wrote: > But to put this in perspective, you would have to *delete* 20 GBytes of > data a day on a ZFS file system for 5 years (according to Intel) to > reach the expected endurance. Forgive my ignorance, but is this not exactly what a SSD ZIL does? A ZIL would need to "delete" it's data when it flushes to disk. I know this thread is about consumer SSDs but are the enterprise SSDs that much better in terms of write cycles (not speed, I know they differ in some cases dramatically). Richard, do you have a blog post about SSDs that I missed in my travels? -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pegging the system
On Thu, 2009-07-16 at 10:51 -0700, Jeff Haferman wrote: > We have a SGE array task that we wish to run with elements 1-7. > Each task generates output and takes roughly 20 seconds to 4 minutes > of CPU time. We're doing them on a machine with about 144 8-core nodes, > and we've divvied the job up to do about 500 at a time. > > So, we have 500 jobs at a time writing to the same ZFS partition. Sorry no answers, just some question that first came to mind. Where is your bottleneck? Is it drive I/O or Network? Are all nodes accessing/writing via NFS? Is this a NFS sync issue? Might a SSD ZIL help? -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single disk parity
On Tue, 2009-07-07 at 17:42 -0700, Richard Elling wrote: > Christian Auby wrote: > > ZFS is able to detect corruption thanks to checksumming, but for single > > drives (regular folk-pcs) it doesn't help much unless it can correct them. > > I've been searching and can't find anything on the topic, so here goes: > > > > 1. Can ZFS do parity data on a single drive? e.g. x% parity for all writes, > > recover on checksum error. > > 2. If not, why not? I imagine it would have been a killer feature. > > > > I guess you could possibly do it by partitioning the single drive and > > running raidz(2) on the partitions, but that would lose you way more space > > than e.g. 10%. Also not practical for OS drive. > > > > You are describing the copies parameter. It really helps to describe > it in pictures, rather than words. So I did that. > http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection > -- richard I think one solution to what Christian is asking is copies. But I think he is asking if there is a way to do something like a 'raid' of the block so that your capacity isn't cut in half. For example, write 5 blocks to the disk, 4 data and one parity, then if any one of the block gets corrupted or is unreadable, then you can reconstruct the missing block. In this example you would only loose 20% of your capacity not 50%. I think this option would only really be useful for home users or simple workstations. It also could have some performance implications. -Jebnor -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Ditto blocks on RAID-Z pool.
Hello all, If you have copies=2 on a large enough raid-z(2) pool and 2(3) disks die, is it possible to recover that information despite the offline state of the pool? I don't have this happening to me, it's just a theoretical question. So, if you can't recover the data, is there any advantage to using ditto blocks on top of raid-z(2)? Jebnor -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card
A couple questions out of pure curiosity. Working on the assumption that you are going to be adding more drives to your server, why not just add the new drives to the Supermicro controller and keep the existing pool (well vdev) where it is? Reading your blog, it seems that you need one (or two if you are mirroring) SATA ports for your rpool. Why not just migrate two drives to the new controller and leave the others where they are? OpenSolaris won't card where the drives are physically connected as long as you export/import. -Jebnor On Fri, 2009-06-19 at 16:21 -0700, Simon Breden wrote: > Hi, > > I'm using 6 SATA ports from the motherboard but I've now run out of SATA > ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA > controller card. > > What is the procedure for migrating the drives to this card? > Is it a simple case of (1) issuing a 'zpool export pool_name' command, (2) > shutdown, (3) insert card and move all SATA cables for drives from mobo to > card, (4) boot and issue a 'zpool import pool_name' command ? > > Thanks, > Simon > > http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- Louis-Frédéric Feuillette ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss