Re: [zfs-discuss] Finding corrupted files
Hi Edward, well that was exactly my point, when I raised this question. If zfs send is able to identify corrupted files while it transfers a snapshot, why shouldn't scrub be able to do the same? ZFS send quit with an I/O error and zpool status -v showed my the file that indeed had problems. Since I thought that zfs send also operates on the block level, I thought whether or not scrub would basically do the same thing. On the other hand scrub really doesn't care about what to read from the device - it simply reads all blocks, which is not the case when running zfs send. Maybe, if zfs send could just go on and not halt on an I/O error and instead just print out the errors… Cheers, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZPool creation brings down the host
On 7/10/10 03:46 PM, Ramesh Babu wrote: I am trying to create ZPool using single veritas volume. The host is going down as soon as I issue zpool create command. It looks like the command is crashing and bringing host down. Please let me know what the issue might be.Below is the command used, textvol is the veritas volume and testpool is the name of pool which I am tyring to create. zpool create testpool /dev/vx/dsk/dom/textvol That's not a configuration that I'd recommend - you're layering one volume management system on top of another. It seems that it's getting rather messy inside the kernel. Do you have the panic stack trace we can look at, and/or a crash dump? James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bursty writes - why?
The NFS client that we're using always uses O_SYNC, which is why it was critical for us to use the DDRdrive X1 as the ZIL. I was unclear on the entire system we're using, my apologies. It is: OpenSolaris SNV_134 Motherboard: SuperMicro X8DAH RAM: 72GB CPU: Dual Intel 5503 @ 2.0GHz ZIL: DDRdrive X1 (two of these, independent and not mirrored) Drives: 24 x Seagate 1TB SAS, 7200 RPM Network connected via 3 x gigabit links as LACP + 1 gigabit backup, IPMP on top of those. The output I posted is from zpool iostat and I used that because it corresponds to what users are seeing. Whenever zpool iostat shows write activity, the file copies to the system are working as expected. As soon as zpool iostat shows no activity, the writes all pause. The simple test case is to copy a cd-rom ISO image to the server while doing the zpool iostat. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZPool creation brings down the host
I am trying to create ZPool using single veritas volume. The host is going down as soon as I issue zpool create command. It looks like the command is crashing and bringing host down. Please let me know what the issue might be.Below is the command used, textvol is the veritas volume and testpool is the name of pool which I am tyring to create. zpool create testpool /dev/vx/dsk/dom/textvol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On Wed, Oct 6 at 22:04, Edward Ned Harvey wrote: * Because ZFS automatically buffers writes in ram in order to aggregate as previously mentioned, the hardware WB cache is not beneficial. There is one exception. If you are doing sync writes to spindle disks, and you don't have a dedicated log device, then the WB cache will benefit you, approx half as much as you would benefit by adding dedicated log device. The sync write sort-of by-passes the ram buffer, and that's the reason why the WB is able to do some good in the case of sync writes. All of your comments made sense except for this one. Every N seconds when the system decides to burst writes to media from RAM, those writes are only sequential in the case where the underlying storage devices are significantly empty. Once you're in a situation where your allocations are scattered across the disk due to longer-term fragmentation, I don't see any way that a write cache would hurt performances on the devices, since it'd allow the drive to reorder writes to the media within that burst of data. Even though ZFS is issuing writes of ~256 sectors if it can, that is only a fraction of a revolution on a modern drive, so random writes of 128KB still have significant opportunity for reordering optimization. Granted, with NCQ or TCQ you can get back much of the cache-disabled performance loss, however, in any system that implements an internal queue depth greater than the protocol-allowed queue depth, there is opportunity for improvement, to an asymptotic limit driven by servo settle speed. Obviously this performance improvement comes with the standard WB risks, and YMMV, IANAL, etc. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Hi Edward, these are interesting points. I have considered a couple of them, when I started playing around with ZFS. I am not sure whether I disagree with all of your points, but I conducted a couple of tests, where I configured my raids as jbods and mapped each drive out as a seperate LUN and I couldn't notice a difference in performance in any way. I'd love to discuss this in a seperate thread, but first I will have to check the archives an Google. ;) Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Stephan Budach > > Now, scrub would reveal corrupted blocks on the devices, but is there a > way to identify damaged files as well? I saw a lot of people offering the same knee-jerk reaction that I had: "Scrub." And that is the only correct answer, to make a best effort at salvaging data. But I think there is a valid question here which was neglected. *Does* scrub produce a list of all the names of all the corrupted files? And if so, how does it do that? If scrub is operating at a block-level (and I think it is), then how can checksum failures be mapped to file names? For example, this is a long-requested feature of "zfs send" which is fundamentally difficult or impossible to implement. Zfs send operates at a block level. And there is a desire to produce a list of all the incrementally changed files in a zfs incremental send. But no capability of doing that. It seems, if scrub is able to list the names of files that correspond to corrupted blocks, then zfs send should be able to list the names of files that correspond to changed blocks, right? I am reaching the opposite conclusion of what's already been said. I think you should scrub, but don't expect file names as a result. I think if you want file names, then tar > /dev/null will be your best friend. I didn't answer anything at first, cuz I was hoping somebody would have that answer. I only know that I don't know, and the above is my best guess. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Stephan Budach > > Ian, > > yes, although these vdevs are FC raids themselves, so the risk is… uhm… > calculated. Whenever possible, you should always JBOD the storage and let ZFS manage the raid, for several reasons. (See below). Also, as counter-intuitive as this sounds (see below) you should disable hardware write-back cache (even with BBU) because it hurts performance in any of these situations: (a) Disable WB if you have access to SSD or other nonvolatile dedicated log device. (b) Disable WB if you know all of your writes to be async mode and not sync mode. (c) Disable WB if you've opted to disable ZIL. * Hardware raid blindly assumes the redundant data written to disk is written correctly. So later, if you experience a checksum error (such as you have) then it's impossible for ZFS to correct it. The hardware raid doesn't know a checksum error has occurred, and there is no way for the OS to read the "other side of the mirror" to attempt correcting the checksum via redundant data. * ZFS has knowledge of both the filesystem, and the block level devices, while hardware raid has only knowledge of block level devices. Which means ZFS is able to optimize performance in ways that hardware cannot possibly do. For example, whenever there are many small writes taking place concurrently, ZFS is able to remap the physical disk blocks of those writes, to aggregate them into a single sequential write. Depending on your metric, this yields 1-2 orders of magnitude higher IOPS. * Because ZFS automatically buffers writes in ram in order to aggregate as previously mentioned, the hardware WB cache is not beneficial. There is one exception. If you are doing sync writes to spindle disks, and you don't have a dedicated log device, then the WB cache will benefit you, approx half as much as you would benefit by adding dedicated log device. The sync write sort-of by-passes the ram buffer, and that's the reason why the WB is able to do some good in the case of sync writes. Ironically, if you have WB enabled, and you have a SSD log device, then the WB hurts you. You get the best performance with SSD log, and no WB. Because the WB "lies" to the OS, saying some tiny chunk of data has been written... then the OS will happily write another tiny chunk, and another, and another. The WB is only buffering a lot of tiny random writes, and in aggregate, it will only go as fast as the random writes. It undermines ZFS's ability to aggregate small writes into sequential writes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Increase size of 2-way mirror
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Tony MacDoodle > > Is it possible to add 2 disks to increase the size of the pool below? > > NAME STATE READ WRITE CKSUM > testpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 It's important that you know the difference between "add" and "attach" methods for increasing this size... If you "add" another mirror, then you'll have mirror-0, mirror-1, and mirror-2. You cannot remove any of the existing devices. If you "attach" a larger disk to mirror-0, and possibly fiddle with the autoexpand property and a little bit of additional futzing (pretty basic, including resilver & detach the old devices) then you can effectively replace the existing devices with larger devices. No need to consume extra disk bays. It's all a matter of which is the more desirable outcome for you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bursty writes - why?
On Wed, 6 Oct 2010, Marty Scholes wrote: If you think about it, this is far more sane than flushing to disk every time the write() system call is used. Yes, it dramatically diminishes the number of copy-on-write writes and improves the pool layout efficiency. It also saves energy. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bursty writes - why?
I think you are seeing ZFS store up the writes, coalesce them, then flush to disk every 30 seconds. Unless the writes are synchronous, the ZIL won't be used, but the writes will be cached instead, then flushed. If you think about it, this is far more sane than flushing to disk every time the write() system call is used. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side
On Wed, Oct 06, 2010 at 05:19:25PM -0400, Miles Nordin wrote: > > "nw" == Nicolas Williams writes: > > nw> *You* stated that your proposal wouldn't allow Windows users > nw> full control over file permissions. > > me: I have a proposal > > you: op! OP op, wait! DOES YOUR PROPOSAL blah blah WINDOWS blah blah > COMPLETELY AND EXACTLY LIKE THE CURRENT ONE. > > me: no, but what it does is... The correct quote is: "no, not under my proposal." That's from a post from you on September 30, 2010, with Message-Id: . That was a direct answer to a direct question. Now, maybe you wish to change your view. That'd be fine. Do not, however, imply that I'm liar though, not if you want to be taken seriously. Please re-write your proposal _clearly_ and refrain from personal attacks. Cheers, Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side
> "nw" == Nicolas Williams writes: nw> *You* stated that your proposal wouldn't allow Windows users nw> full control over file permissions. me: I have a proposal you: op! OP op, wait! DOES YOUR PROPOSAL blah blah WINDOWS blah blah COMPLETELY AND EXACTLY LIKE THE CURRENT ONE. me: no, but what it does is... you: well then I don't even have to read it. It's unacceptable because $BLEH. me: untrue. My proposal handles $BLEH just fine. you: you just said it didn't! me: well, it does. Please read it. you: I read it and I don't understand it. Anyway it doesn't handle $BLEH so it's no good. This is not really working, and concision is the problem. so, I now, today, state: My proposal allows Windows users full control over file permissions. nw> Yes, that may be. I encourage you to find a clearer way to nw> express your proposal. So far, it's just us talking. I think I'll wait and see if anyone besides you reads it. If so, maybe they can ask questions that help me clarify it. If no one does, it's probably not interesting here anyway. pgp4wuhrA1SzN.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
> "dd" == David Dyer-Bennet writes: dd> Richard Elling said ZFS handles the 4k real 512byte fake dd> drives okay now in default setups There are two steps to handling it well. one is to align the start of partitions to 4kB, and apparently on Solaris (thanks to all the cumbersome partitioning tools) that is done. On Linux you often have to really pay attention to make this happen, depending on the partitioning tool that happens to be built into your ``distro'' or whatever. The second step is to never write anything smaller than 4kB. ex., if you want to write 0.5kB, pad it with 3.5kB of zeroes to avoid the read-modify-write penalty. AIUI that is not done yet, and zfs does sometimes want to write 0.5kB. When it's writing 128kB of course there is no penalty. For this, I think XFS and NTFS are actually better and tend not to write the small blocks, but I could be wrong. pgpn3kSSlfThy.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bursty writes - why?
I have a 24 x 1TB system being used as an NFS file server. Seagate SAS disks connected via an LSI 9211-8i SAS controller, disk layout 2 x 11 disk RAIDZ2 + 2 spares. I am using 2 x DDR Drive X1s as the ZIL. When we write anything to it, the writes are always very bursty like this: ool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0232 0 29.0M xpool488K 20.0T 0101 0 12.7M xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 50 0 6.37M xpool488K 20.0T 0477 0 59.7M xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool488K 20.0T 0 0 0 0 xpool 74.7M 20.0T 0702 0 76.2M xpool 74.7M 20.0T 0577 0 72.2M xpool 74.7M 20.0T 0110 0 13.9M xpool 74.7M 20.0T 0 0 0 0 xpool 74.7M 20.0T 0 0 0 0 xpool 74.7M 20.0T 0 0 0 0 xpool 74.7M 20.0T 0 0 0 0 Whenever you see 0 the write is just hanging. What I would like to see is at least some writing happening every second. What can I look at for this issue? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
> "ag" == Andrew Gabriel writes: ag> Having now read a number of forums about these, there's a ag> strong feeling WD screwed up by not providing a switch to ag> disable pseudo 512b access so you can use the 4k native. this reporting lie is no different from SSD's which have 2 - 8 kB sectors on the inside and benefit from alignment. I think probably everything will report 512 byte sectors forever. If a device had a 4224-byte sector, it would make sense to report that, but I don't see a big downside to reporting 512 when it's really 4096. NAND flash often does have sectors with odd sizes like 4224, and (some of) Linux's NAND-friendly filesystems (ubifs, yaffs, nilfs) use this OOB area for filesystem structures, which are intermixed with the ECC. but in that case it's not a SCSI interface to the odd-sized sector---it's an ``mtd'' interface that supports operations like ``erase page'', ``suspend erasing'', ``erase some more''. that said I am in the ``ignore WD for now'' camp. but this isn't why. Ignore them (among other, better reasons) because they have 4k sectors at all which don't yet work well until we can teach ZFS to never write smaller than 4kB. but failure to report 4k as SCSI 4kB sector is not a problem, to my view. You can just align your partitions. pgp6jwIDoUJ9i.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
> Hi all > > I just discovered WD Black drives are rumored not to > be set to allow TLER. Yep: http://opensolaris.org/jive/message.jspa?messageID=501159#501159 > Enterprise drives will cost > about 60% more, and on a large install, that means a > lot of money... True, sometimes more than twice the price. If these are for a business, personally I would invest in TLER-capable drives like the WD REx models (RAID Edition). These allow for fast fails on read/write errors so that the data can be remapped. This prevents the possibility of the drive being kicked from the array. If these are for home and you don't have, or are not willing to spend a lot more on TLER-capable drives then go for something reliable. Forget WD Green drives (see links below). After WD removed TLER-setting on their non-enterprise drives, I have switched to Samsung HD203WI drives and so far these have been flawless. I believe it's a 4-platter model. Samsung have very recently (last month?) brought out a HD204UI model which is a 3-platter (667GB per platter) model, which should be even better -- check the newegg ratings for good/bad news etc. http://opensolaris.org/jive/thread.jspa?threadID=121871&tstart=0 http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/#drives http://jmlittle.blogspot.com/2010/03/wd-caviar-green-drives-and-zfs.html Cheers, Simon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side
On Wed, Oct 06, 2010 at 04:38:02PM -0400, Miles Nordin wrote: > > "nw" == Nicolas Williams writes: > > nw> The current system fails closed > > wrong. > > $ touch t0 > $ chmod 444 t0 > $ chmod A0+user:$(id -nu):write_data:allow t0 > $ ls -l t0 > -r--r--r--+ 1 carton carton 0 Oct 6 20:22 t0 > > now go to an NFSv3 client: > $ ls -l t0 > -r--r--r-- 1 carton 405 0 2010-10-06 16:26 t0 > $ echo lala > t0 > $ > > wide open. The system does what the ACL says. The mode fails to accurately represent the actual access because... the mode can't. Now, we could have chosen (and still could choose to) represent the presence of ACEs for subjects other than owner@/group@/everyone@ by using the group bits of the mode to represent the maximal set of permissions granted. But I don't consider the above "failing open". > nw> You seem to be in denial. You continue to ignore the > nw> constraint that Windows clients must be able to fully control > nw> permissions in spite of their inability to perceive and modify > nw> file modes. > > You remain unshakably certain that this is true of my proposal in > spite of the fact that you've said clearly that you don't understand > my proposal. That's bad science. *You* stated that your proposal wouldn't allow Windows users full control over file permissions. > It may be my fault that you don't understand it: maybe I need to write > something shorter but just as expressive to fit within mailing list > attention spans, or maybe my examples are unclear. However that > doesn't mean that I'm in denial nor make you right---that just makes > me annoying. Yes, that may be. I encourage you to find a clearer way to express your proposal. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side
> "nw" == Nicolas Williams writes: nw> The current system fails closed wrong. $ touch t0 $ chmod 444 t0 $ chmod A0+user:$(id -nu):write_data:allow t0 $ ls -l t0 -r--r--r--+ 1 carton carton 0 Oct 6 20:22 t0 now go to an NFSv3 client: $ ls -l t0 -r--r--r-- 1 carton 405 0 2010-10-06 16:26 t0 $ echo lala > t0 $ wide open. NFSv3 and SMB sharing the same dataset is a use-case you claim to accomodate. This case fails open once Windows users start adding 'allow' ACL's. It's not a corner case; it's a design that fails open. >> ever had 777 it would send a SIGWTF to any AFS-unaware >> graybeards nw> A signal?! How would that work when the entity doing a chmod nw> is on a remote NFS client? please find SIGWTF under 'kill -l' and you might understand what I meant. nw> You seem to be in denial. You continue to ignore the nw> constraint that Windows clients must be able to fully control nw> permissions in spite of their inability to perceive and modify nw> file modes. You remain unshakably certain that this is true of my proposal in spite of the fact that you've said clearly that you don't understand my proposal. That's bad science. It may be my fault that you don't understand it: maybe I need to write something shorter but just as expressive to fit within mailing list attention spans, or maybe my examples are unclear. However that doesn't mean that I'm in denial nor make you right---that just makes me annoying. -- READ CAREFULLY. By reading this fortune, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. pgpvrZFYgaHat.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Ian, yes, although these vdevs are FC raids themselves, so the risk is… uhm… calculated. Unfortuanetly, one of the devices seems to have some issues, as stated im my previous post. I will, nevertheless, add redundancy to my pool asap. Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Hi Cindy, thanks for bringing that to my attention. I checked fmdump and found a lot of these entries: Okt 06 2010 17:52:12.862812483 ereport.io.scsi.cmd.disk.tran nvlist version: 0 class = ereport.io.scsi.cmd.disk.tran ena = 0x514dc67d57e1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,3...@7/pci1077,1...@0,1/f...@0,0/d...@w21d02305ff42,0 (end detector) driver-assessment = retry op-code = 0x88 cdb = 0x88 0x0 0x0 0x0 0x0 0x2 0xac 0xd4 0x3d 0x80 0x0 0x0 0x0 0x80 0x0 0x0 pkt-reason = 0x3 pkt-state = 0x0 pkt-stats = 0x20 __ttl = 0x1 __tod = 0x4cac9b2c 0x336d7943 Okt 06 2010 17:52:12.862813713 ereport.io.scsi.cmd.disk.recovered nvlist version: 0 class = ereport.io.scsi.cmd.disk.recovered ena = 0x514dc67d57e1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /p...@0,0/pci8086,3...@7/pci1077,1...@0,1/f...@0,0/d...@w21d02305ff42,0 devid = id1,s...@n600d02310005ff42712ab96c (end detector) driver-assessment = recovered op-code = 0x88 cdb = 0x88 0x0 0x0 0x0 0x0 0x2 0xac 0xd4 0x3d 0x80 0x0 0x0 0x0 0x80 0x0 0x0 pkt-reason = 0x0 pkt-state = 0x1f pkt-stats = 0x0 __ttl = 0x1 __tod = 0x4cac9b2c 0x336d7e11 Googling about these errors brought me directly to this document: http://dsc.sun.com/solaris/articles/scsi_disk_fma2.html which talks about these scsi errors. Since we're talking FC here, it seems to point to some FC issue I have not been aware of. Furthermore, it's always the same FC device that show these errors, so I will try to check the device and it's connections to the fabric first. Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub doesn't finally finish?
Seems like it's really the case, that scrub doesn't take traffic that goes onto the zpool while it's scrubbing away. After some more time, the scrub finished and everything looks good so far. Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Increase size of 2-way mirror
On Wed, October 6, 2010 14:14, Tony MacDoodle wrote: > Is it possible to add 2 disks to increase the size of the pool below? > > NAME STATE READ WRITE CKSUM > testpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 You have two ways to increase the size of this pool (sanely). First, you can add a third mirror vdev. I think that's what you're specifically asking about. You do this with the "zpool add ..." command, see man page. Second, you can add (zpool attach) two larger disks to one of the existing mirror vdevs, wait until the resilvers have finished, and then detach the two original (smaller) disks. At that point (with recent versions; with older versions you have to set a property) the vdev will expand to use the full capacity of the new larger disks, and that space will become available in the pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On 10/ 6/10 09:52 PM, Stephan Budach wrote: Hi, I recently discovered some - or at least one corrupted file on one ofmy ZFS datasets, which caused an I/O error when trying to send a ZFDS snapshot to another host: zpool status -v obelixData pool: obelixData state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM obelixData ONLINE 4 0 0 c4t21D023038FA8d0 ONLINE 0 0 0 c4t21D02305FF42d0 ONLINE 4 0 0 Are you aware that this is a very dangerous configuration? Your pool lacks redundancy and you will loose it if one of the devices fails. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Increase size of 2-way mirror
On Wed, Oct 6, 2010 at 12:14 PM, Tony MacDoodle wrote: > Is it possible to add 2 disks to increase the size of the pool below? Yes. zpool add testpool mirror devname1 devname2 That will add a third mirror vdev to the pool. > NAME STATE READ WRITE CKSUM > testpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Budy, Your previous zpool status output shows a non-redundant pool with data corruption. You should use the fmdump -eV command to find out the underlying cause of this corruption. You can review the hardware-level monitoring tools, here: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Thanks, Cindy On 10/06/10 13:09, Stephan Budach wrote: Well I think, that answers my question then: after a successful scrub, zpool status -v should then list all damaged files on an entire zpool. I only asked, because I read a thread in this forum that one guy had a problem with different files, aven after a successful scrub. Thanks, budy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Increase size of 2-way mirror
Is it possible to add 2 disks to increase the size of the pool below? NAME STATE READ WRITE CKSUM testpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Well I think, that answers my question then: after a successful scrub, zpool status -v should then list all damaged files on an entire zpool. I only asked, because I read a thread in this forum that one guy had a problem with different files, aven after a successful scrub. Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
- Original Message - > On Tue, October 5, 2010 17:20, Richard Elling wrote: > > On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote: > >> > >> On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote: > > >>> Well, here it's about 60% up and for 150 drives, that makes a wee > >>> difference... > > >> Understood on 1.6 times cost, especially for quantity 150 drives. > > > One service outage will consume far more in person-hours and > > downtime than > > this little bit of money. Penny-wise == Pound-foolish? > > That looks to be true, yes (going back to the actual prices, 150 > drives would cost $6000 extra for the enterprise versions). I somehow doubt a service outage will consume that lot. The drives will be carefully distributed in smallish RAIDz2 VDEVs on two separate large systems and one small one, and all of them are dedicated for backup targets (Bacula using their drives for storing backup). We already have a 50TB setup on mostly Green drives, and although I now know that's a terrible idea, it's been running stably for about a year with quite constant load. So really, I beleive the chance for non-TLER drives to mess this up badly is a minor one (and perhaps more importantly, so does my boss). Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
> > TLER (the ability of the drive to timeout a command) > > I went and got what detailed documentation I could on a couple of the > Seagate drives last night, and I couldn't find anything on how they > behaved in that sort of error cases. (I believe TLER is a WD-specific > term, but I didn't just search, I read them through.) > > So that's inconvenient. How do we find out about that sort of thing? From http://en.wikipedia.org/wiki/TLER Similar technologies are called Error Recovery Control (ERC), used by competitor Seagate, and Command Completion Time Limit (CCTL), used by Samsung and Hitachi. I haven't checked which drives have those abilities, though... Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs build/test article on anandtech
http://www.anandtech.com/show/3963/zfs-building-testing-and-benchmarking I'm curious why nexenta did not perform as well as opensolaris. Both OS versions seem to be the same. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write retry errors to SSD's on SAS backplane (mpt)
Hi, I came across this exact same problem when using Intel X25-E Extreme 32GB SSD disks as ZIL and L2ARC devices in a T5220 server. Since I didn't see a definitive solution here, I opened a support case with Oracle. They told me to upgrade the firmware on my SSD disks and LSI Expander, and the problem went away. Here's the solution they gave me after an analysis of my system: [i]1. customer is using systemboard 540-7970 (showfru). According to SSH (http://sunsolve.central/handbook_internal/Devices/System_Board/SYSBD_SE_T5120_T5220.html#7970) this is 1068E B2 board. Customer's LSI firmware is at latest already (1.27.02.00). So, 140952-02 not needed. 2. SSD is at 8850 (diskinfo), need to go to 8855->8862 (143211-01, obsolete by 143211-02, rev 8862). No Cougar Card I can see, hence aac drive patch not needed. see README in 143211-02 3. btw, Customer already at KUP 139555-08 & 140796-01. So, here is what I suggest, Most importantly, the ssd disk fw patch is the most critical as readme from patch 143211-02 states the possible bug fixes (CR 6918513 & 6827668). Follow the Install Instructions there. As far as the patch 141043-01 (LSI Expander firmware for 16 Disk backplane on Sun SPARC Enterprise T5220 and T5240 platforms), your disk backplane version maybe already there. But, there is no way to tell as far as I know until one goes through the update process. Perhaps customer can skip for now. Otherwise, they can go through it but skip step 1-3 Install.info from patch 141043-01 1. # patchadd 126419-02 2. # patchadd 13-05 3. # reboot **After reboot, need to install firmwareflash package for SPARC systems 4. # pkgadd -d 5. # firmwareflash -l**To list all available ses devices in the system 6. # firmwareflash -d -f LSI_X28EXPDR_16DISK_BootRec_REV5-SPARC_Enterprise_T5220+T5240.rxp 7. # reboot 8. # You must now power cycle the system to run the new boot record firmware just loaded. [/i] So the main thing you need to do is apply firmware upgrade 143211-02, which specifically addresses the issue of retryable writes on SSD disks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Budy, > No - not a trick question., but maybe I didn't make myself clear. > Is there a way to discover such bad files other than trying to actually read > from them one by one, say using cp or by sending a snapshot elsewhere? As noted by your original email, ZFS reports on any corruption using the "zpool status" command. ZFS detects corruption as part of its normal filesystem operations, which maybe triggered by: cp, send-recv, etc., or by a forced reading of the entire filesystem by scrub. > I am well aware that the file shown in zpool status -v is damaged and I have > already restored it, but I wanted to know, if there're more of them. Assuming that the ZFS filesystem in question is not degrading further (as in a disk going bad), upon completion of a successful scrub, zpool reports the complete status of the filesystem being reported on. - Jim > > Regards, > budy > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub doesn't finally finish?
Yes - that may well be. There was data going on to the device while scrub has been running. Especially large zfs receives had been going on. I'd be odd if that was the case, though. Cheers, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Scrub? On Oct 6, 2010, at 6:48 AM, Stephan Budach wrote: > No - not a trick question., but maybe I didn't make myself clear. > Is there a way to discover such bad files other than trying to actually read > from them one by one, say using cp or by sending a snapshot elsewhere? > > I am well aware that the file shown in zpool status -v is damaged and I have > already restored it, but I wanted to know, if there're more of them. > > Regards, > budy > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, October 5, 2010 16:47, casper@sun.com wrote: > > >>My immediate reaction to this is "time to avoid WD drives for a while"; >>until things shake out and we know what's what reliably. >> >>But, um, what do we know about say the Seagate Barracuda 7200.12 ($70), >>the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5" >>($70)? > > > I've seen several important features when selecting a drive for > a mirror: > > TLER (the ability of the drive to timeout a command) I went and got what detailed documentation I could on a couple of the Seagate drives last night, and I couldn't find anything on how they behaved in that sort of error cases. (I believe TLER is a WD-specific term, but I didn't just search, I read them through.) So that's inconvenient. How do we find out about that sort of thing? > sector size (native vs virtual) Richard Elling said ZFS handles the 4k real 512byte fake drives okay now in default setups; but somebody immediately asked for version info, so I'm still watching this one. > power use (specifically at home) Hadn't thought about that. But when I'm upgrading drives, I figure I'm always going to come out better on power than when I started. > performance (mostly for work) I can't bring myself to buy below 7200RPM, but it's probably foolish (except that other obnoxious features tend to come in the "green" drives). > price Yeah, well. I'm cheap. > I've heard scary stories about a mismatch of the native sector size and > unaligned Solaris partitions (4K sectors, unaligned cylinder). So have I. Sounds like you get read-modify-write actions for non-aligned accesses. I hope the next generation of drives admit to being 4k sectors, and that ZFS will be prepared to use them sensibly. But I'm not sure I'm willing to wait for that; the oldest drives in my box are now 4 years old, and I'm about ready for the next capacity upgrade. > I was pretty happen with the WD drives (except for the one with a > seriously > broken cache) but I see the reasons to not to pick WD drives over the 1TB > range. And the big ones are what pretty much everybody is using at home. Capacity and price are vastly more important than performance for most of us. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there a way to limit ZFS File Data but maintain room for the ARC to cache metadata
Good idea. Provides options, but it would be nice to be able to set a low water mark on what can be taken away from the arc metadata cache without having to have something like an SSD. Dave On 10/01/10 14:02, Freddie Cash wrote: On Fri, Oct 1, 2010 at 11:46 AM, David Blasingame Oracle wrote: I'm working on this scenario in which file system activity appears to cause the arc cache to evict meta data. I would like to have a preference to keep the metadata in cache over ZFS File Data What I've notice on import of a zpool the arc_meta_used goes up significantly. ZFS meta data operations usually run pretty good. However over time with IO Operations the cache get's evicted and arc_no_grow get set. So, I would like to limit the amount of ZFS File Data that can be used and keep the arc cache warm with metadata. Any suggestions? Would adding a cache device (L2ARC) and setting primarycache=metadata and secondarycache=all on the root dataset do what you need? That way ARC is used strictly for metadata, and L2ARC is used for metadata+data. -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, October 5, 2010 17:20, Richard Elling wrote: > On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote: >> >> On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote: >>> Well, here it's about 60% up and for 150 drives, that makes a wee >>> difference... >> Understood on 1.6 times cost, especially for quantity 150 drives. > One service outage will consume far more in person-hours and downtime than > this little bit of money. Penny-wise == Pound-foolish? That looks to be true, yes (going back to the actual prices, 150 drives would cost $6000 extra for the enterprise versions). It's still quite annoying to be jerked around by people charging 60% extra for changing a timeout in the firmware, and carefully making it NOT user-alterable. Also, the non-TLER versions are a constant threat to anybody running home systems, who might quite reasonably think they could put those in a home server. (Yeah, I know the enterprise versions have other differences. I'm not nearly so sure I CARE about the other differences, in the size servers I'm working with.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub doesn't finally finish?
Have you had a lot of activity since the scrub started? I have noticed what appears to be extra I/O at the end of a scrub when activity took place during the scrub. It's as if the scrub estimator does not take the extra activity into account. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic after upgrading from snv_138 to snv_140
Hi, my machine is a HP ProLiant ML350 G5 with 2 quad-core Xeons, 32GB RAM and a HP SmartArray E200i RAID controller with 3x160 and 3x500GB SATA discs connected to it. Two of the 160GB discs build the mirrored root pool (rpool), the third serves as a temporary data pool called "tank", and the three 500G discs form a RAIDZ1 pool called "daten". So far I successfully upgraded from OpenSolaris b134 to b138 by manually building ONNV. Recently I built b140, installed it, but unfortunately booting results in a kernel panic: ... NOTICE: zfs_parse_bootfs: error 22 Cannot mount root on rpool/187 fstype zfs panic[cpu0]/thread=fbc2f660: vfs_mountroot: cannot mount root fbc71ba0 genunix:vfs_mountroot+32e () fbc71bd0 genunix:main+136 () fbc71be0 unix:_locore_start+92 () panic: entering debugger (no dump device, continue to reboot) Welcome to kmdb Loaded modules: [ scsi_vhci mac uppc sd unix zfs krtld genunix specfs pcplusmp cpu.generic ] [0]> Before the above attempt with b140, I tried to upgrade to OpenIndiana, but have quite the same problem; OI doesn't boot neither. See http://openindiana.org/pipermail/openindiana-discuss/2010-September/000504.html Any ideas what is causing this kernel panic? Regards Thorsten -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
No - not a trick question., but maybe I didn't make myself clear. Is there a way to discover such bad files other than trying to actually read from them one by one, say using cp or by sending a snapshot elsewhere? I am well aware that the file shown in zpool status -v is damaged and I have already restored it, but I wanted to know, if there're more of them. Regards, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On 06 October, 2010 - Stephan Budach sent me these 2,1K bytes: > Hi, > > I recently discovered some - or at least one corrupted file on one ofmy ZFS > datasets, which caused an I/O error when trying to send a ZFDS snapshot to > another host: > > > zpool status -v obelixData > pool: obelixData > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. >see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > obelixData ONLINE 4 0 0 > c4t21D023038FA8d0 ONLINE 0 0 0 > c4t21D02305FF42d0 ONLINE 4 0 0 > > errors: Permanent errors have been detected in the following files: > > <0x949>:<0x12b9b9> > > obelixData/jvmprepr...@2010-10-02_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in > CI vor ET 10.6.2010/13404_41_07008 Estate > HandelsMarketing/Dealer_Launch_Invitations > Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps > > obelixData/jvmprepr...@backupsnapshot_2010-10-05-08:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ > in CI vor ET 10.6.2010/13404_41_07008 Estate > HandelsMarketing/Dealer_Launch_Invitations > Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps > > obelixData/jvmprepr...@2010-09-24_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in > CI vor 6_210/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations > Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps > /obelixData/JvMpreprint/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor > ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations > Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps > > Now, scrub would reveal corrupted blocks on the devices, but is there a way > to identify damaged files as well? Is this a trick question or something? The filenames are right over your question..? /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] scrub doesn't finally finish?
Hi all, I have issued a scrub on a pool, that consists of two independant FC raids. The scrub has been running for approx. 25 hrs and then showed 100%, but there's still an incredible traffic on one of the FC raids going on, plus zpool statuv -v reports that scrub is still running: zpool status -v backupPool_01 pool: backupPool_01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 26h45m, 100,00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM backupPool_01ONLINE 0 0 0 c3t211378AC0271d0 ONLINE 0 026 2,11M repaired c3t211378AC026Ed0 ONLINE 0 0 0 errors: No known data errors So, what is scrub still doing to the upper vdev? Is there anywhere where I can get more information about what scrub is still doing? Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
www.solarisinternals.com has always been a community. It never was hosted by Sun, and it's not hosted by Oracle. True, many of the contributors were Sun employees, but not so many remain at Oracle. If it's out if date, I suspect that's because the original contributors are too busy doing other fun things. However, it is a wiki, so YOU can apply for a login and edit it if you have something useful to share :) On 6 Oct 2010, at 02:36, Michael DeMan wrote: > Hi upfront, and thanks for the valuable information. > > > On Oct 5, 2010, at 4:12 PM, Peter Jeremy wrote: > >>> Another annoying thing with the whole 4K sector size, is what happens >>> when you need to replace drives next year, or the year after? >> >> About the only mitigation needed is to ensure that any partitioning is >> based on multiples of 4KB. > > I agree, but to be quite honest, I have no clue how to do this with ZFS. It > seems that it should be something under the regular tuning documenation. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > Is it going to be the case that basic information like about how to deal with > common scenarios like this is no longer going to be publicly available, and > Oracle will simply keep it 'close to the vest', with the relevant information > simply available for those who choose to research it themselves, or only > available to those with certain levels of support contracts from Oracle? > > To put it another way - does the community that uses ZFS need to fork 'ZFS > Best Practices' and 'ZFZ Evil Tuning' to ensure that it is reasonably up to > date? > > Sorry for the somewhat hostile in the above, but the changes w/ the merger > have demoralized a lot of folks I think. > > - Mike > > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
casper@sun.com wrote: On Tue, Oct 5, 2010 at 11:49 PM, wrote: I'm not sure that that is correct; the drive works on naive clients but I believe it can reveal its true colors. The drive reports 512 byte sectors to all hosts. AFAIK there's no way to make it report 4k sectors. Too bad because it makes it less useful (specifically because the label mentions sectors and if you can use bigger sectors, you can address a larger drive). Having now read a number of forums about these, there's a strong feeling WD screwed up by not providing a switch to disable pseudo 512b access so you can use the 4k native. The industry as a whole will transition to 4k sectorsize over next few years, but these first 4k sectorsize HDs are rather less useful with 4k sectorsize-aware OS's. Let's hope other manufacturers get this right in their first 4k products. -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Finding corrupted files
Hi, I recently discovered some - or at least one corrupted file on one ofmy ZFS datasets, which caused an I/O error when trying to send a ZFDS snapshot to another host: zpool status -v obelixData pool: obelixData state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM obelixData ONLINE 4 0 0 c4t21D023038FA8d0 ONLINE 0 0 0 c4t21D02305FF42d0 ONLINE 4 0 0 errors: Permanent errors have been detected in the following files: <0x949>:<0x12b9b9> obelixData/jvmprepr...@2010-10-02_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps obelixData/jvmprepr...@backupsnapshot_2010-10-05-08:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps obelixData/jvmprepr...@2010-09-24_2359:/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor 6_210/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps /obelixData/JvMpreprint/DTP/Jobs/Mercedes-Benz/C_Klasse/RZ in CI vor ET 10.6.2010/13404_41_07008 Estate HandelsMarketing/Dealer_Launch_Invitations Fremddokumente/Dealer_Launch_S204/Images/Vorhang_Innen.eps Now, scrub would reveal corrupted blocks on the devices, but is there a way to identify damaged files as well? Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS crypto bug status change
On 05/10/2010 20:14, Miles Nordin wrote: I'm glad it wasn't my project, though. If I were in Darren's place I'd have signed on to work for an open-source company, spent seven years of my life working on something, delaying it and pushing hard to make it a generation beyond other filesystem crypto, and then when I'm finally done,. Please don't speculate, nobody but me and a very few others inside Oracle have all the facts of why this integrated when it did; and I'm not going to give all the details here because it is neither relevant nor appropriate. For the record I didn't sign on to an open-source company, I joined Sun many many years before OpenSolaris (in 1996 in fact), I didn't even join initially as a developer I was in SunService doing backline support and a little sustaining engineering for Trusted Solaris 1.x (the SunOS 4.1.3 era version). One the other before I joined Sun I was one of the first people to have a working "clone" of the then Trusted Solaris privilege system into Linux - for what later became the capabilities system in Linux. While I appreciate open source I'm not against closed source - if I was I wouldn't have joined Sun in 1996 and I wouldn't have had my jobs prior to that either (In fact I doubt I'd be in this industry at all). Just because I have and continue to participate in the open where I find it appropriate and useful (to me and others) doesn't mean I'm an open source or nothing person. Quite the opposite in fact, open source is a "tool" or "means to an end" and one always has to pick the right tool for the job at the right time. I care deeply about software quality and I don't believe the "ra ra" that just by being open source makes software better quality or more secure. Many eyes can help find bugs but only if there are actually people actively looking. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
>On Tue, Oct 5, 2010 at 11:49 PM, wrote: >> I'm not sure that that is correct; the drive works on naive clients but I >> believe it can reveal its true colors. > >The drive reports 512 byte sectors to all hosts. AFAIK there's no way >to make it report 4k sectors. Too bad because it makes it less useful (specifically because the label mentions sectors and if you can use bigger sectors, you can address a larger drive). They still have all sizes w/o "Advanced Format" (non EARS/AARS models) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, Oct 5, 2010 at 11:49 PM, wrote: > I'm not sure that that is correct; the drive works on naive clients but I > believe it can reveal its true colors. The drive reports 512 byte sectors to all hosts. AFAIK there's no way to make it report 4k sectors. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
If you're spending upwards of $30,000 on a storage system, you probably shouldn't skimp on the most important component. You might as well be complaining that ECC ram costs more. Don't be ridiculous. For one, this is a disk backup system, not a fileserver, and TLER is far from as critic al as ECC. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
Can you give us release numbers that confirm that this is 'automatic'. It is my understanding that the last available public release of OpenSolaris does not do this. On Oct 5, 2010, at 8:52 PM, Richard Elling wrote: > ZFS already aligns the beginning of data areas to 4KB offsets from the label. > For modern OpenSolaris and Solaris implementations, the default starting > block for partitions is also aligned to 4KB. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss