Re: [zfs-discuss] Bursty writes - why?
The NFS client that we're using always uses O_SYNC, which is why it was critical for us to use the DDRdrive X1 as the ZIL. I was unclear on the entire system we're using, my apologies. It is: OpenSolaris SNV_134 Motherboard: SuperMicro X8DAH RAM: 72GB CPU: Dual Intel 5503 @ 2.0GHz ZIL: DDRdrive X1 (two of these, independent and not mirrored) Drives: 24 x Seagate 1TB SAS, 7200 RPM Network connected via 3 x gigabit links as LACP + 1 gigabit backup, IPMP on top of those. The output I posted is from zpool iostat and I used that because it corresponds to what users are seeing. Whenever zpool iostat shows write activity, the file copies to the system are working as expected. As soon as zpool iostat shows no activity, the writes all pause. The simple test case is to copy a cd-rom ISO image to the server while doing the zpool iostat. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZPool creation brings down the host
On 7/10/10 03:46 PM, Ramesh Babu wrote: I am trying to create ZPool using single veritas volume. The host is going down as soon as I issue zpool create command. It looks like the command is crashing and bringing host down. Please let me know what the issue might be.Below is the command used, textvol is the veritas volume and testpool is the name of pool which I am tyring to create. zpool create testpool /dev/vx/dsk/dom/textvol That's not a configuration that I'd recommend - you're layering one volume management system on top of another. It seems that it's getting rather messy inside the kernel. Do you have the panic stack trace we can look at, and/or a crash dump? James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Hi Edward, well that was exactly my point, when I raised this question. If zfs send is able to identify corrupted files while it transfers a snapshot, why shouldn't scrub be able to do the same? ZFS send quit with an I/O error and zpool status -v showed my the file that indeed had problems. Since I thought that zfs send also operates on the block level, I thought whether or not scrub would basically do the same thing. On the other hand scrub really doesn't care about what to read from the device - it simply reads all blocks, which is not the case when running zfs send. Maybe, if zfs send could just go on and not halt on an I/O error and instead just print out the errors… Cheers, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On 10/ 7/10 06:22 PM, Stephan Budach wrote: Hi Edward, these are interesting points. I have considered a couple of them, when I started playing around with ZFS. I am not sure whether I disagree with all of your points, but I conducted a couple of tests, where I configured my raids as jbods and mapped each drive out as a seperate LUN and I couldn't notice a difference in performance in any way. The time you will notice is when a cable falls out or becomes loose and you get corrupted data and loose the pool due to lack of redundancy. Even though your LUNs are RAID, there are still numerous single points of failure between them and the target system. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Swapping disks in pool to facilitate pool growth
Hi Guys, We are a running a Solaris 10 production server being used for backup services within our DC. We have 8 500GB drives in a zpool and we wish to swap them out 1 by 1 for 1TB drives. I would like to know if it is viable to add larger disks to zfs pool to grow the pool size and then remove the smaller disks? I would assume this would degrade the pool and require it to resilver? Any advice would be gratefully received. Kind regards Kevin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Swapping disks in pool to facilitate pool growth
On 07/10/2010 11:22, Kevin Walker wrote: We are a running a Solaris 10 production server being used for backup services within our DC. We have 8 500GB drives in a zpool and we wish to swap them out 1 by 1 for 1TB drives. I would like to know if it is viable to add larger disks to zfs pool to grow the pool size and then remove the smaller disks? This is covered in the documentation: docs.sun.com Home Oracle Solaris 10 System Administrator Collection Oracle Solaris ZFS Administration Guide 1. Oracle Solaris ZFS File System (Introduction) What's New in ZFS? ZFS Device Replacement Enhancements http://docs.sun.com/app/docs/doc/819-5461/githb?l=ena=view I would assume this would degrade the pool and require it to resilver? A resilver is required without it there is no way to get your data on to the new drives. Wither the pool is degraded or not depends on wither you dettach an existing side of a two way mirror to do the replacement with the larger drive or if you create a three way mirror first and then dettach one of the size of the original two way mirror. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
Ian, I know - and I will address this, by upgrading the vdevs to mirrors, but there're a lot of other SPOFs around. So I started out by reducing the most common failures and I have found that to be the disc drives, not the chassis. The beauty is: one can work their way up until the point of securuty is reached or until there is no more money to spend. Cheers, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Stephan Budach I conducted a couple of tests, where I configured my raids as jbods and mapped each drive out as a seperate LUN and I couldn't notice a difference in performance in any way. Not sure if my original points were communicated clearly. Giving JBOD's to ZFS is not for the sake of performance. The reason for JBOD is reliability. Because hardware raid cannot detect or correct checksum errors. ZFS can. So it's better to skip the hardware raid and use JBOD, to enable ZFS access to each separate side of the redundant data. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
From: edmud...@mail.bounceswoosh.org [mailto:edmud...@mail.bounceswoosh.org] On Behalf Of Eric D. Mudama On Wed, Oct 6 at 22:04, Edward Ned Harvey wrote: * Because ZFS automatically buffers writes in ram in order to aggregate as previously mentioned, the hardware WB cache is not beneficial. There is one exception. If you are doing sync writes to spindle disks, and you don't have a dedicated log device, then the WB cache will benefit you, approx half as much as you would benefit by adding dedicated log device. The sync write sort-of by-passes the ram buffer, and that's the reason why the WB is able to do some good in the case of sync writes. All of your comments made sense except for this one. (etc) Your point about long-term fragmentation and significant drive emptiness are well received. I never let a pool get over 90% full, for several reasons including this one. My target is 70%, which seems to be sufficiently empty. Also, as you indicated, blocks of 128K are not sufficiently large for reordering to benefit. There's another thread here, where I calculated, you need blocks approx 40MB in size, in order to reduce random seek time below 1% of total operation time. So all that I said will only be relevant or accurate if within 30sec (or 5 sec in the future) there exists at least 40M of aggregatable sequential writes. It's really easy to measure and quantify what I was saying. Just create a pool, and benchmark it in each configuration. Results that I measured were: (stripe of 2 mirrors) 721 IOPS without WB or slog. 2114 IOPS with WB 2722 IOPS with WB and slog 2927 IOPS with slog, and no WB There's a whole spreadsheet full of results that I can't publish, but the trend of WB versus slog was clear and consistent. I will admit the above were performed on relatively new, relatively empty pools. It would be interesting to see if any of that changes, if the test is run on a system that has been in production for a long time, with real user data in it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Swapping disks in pool to facilitate pool growth
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Kevin Walker We are a running a Solaris 10 production server being used for backup services within our DC. We have 8 500GB drives in a zpool and we wish to swap them out 1 by 1 for 1TB drives. I would like to know if it is viable to add larger disks to zfs pool to grow the pool size and then remove the smaller disks? I would assume this would degrade the pool and require it to resilver? Because it's a raidz, yes it will be degraded each time you remove one disk. You will not be using attach and detach. You will be using replace Because it's a raidz, each resilver time will be unnaturally long. Raidz resilver code is inefficient. Just be patient and let it finish each time before you replace the next disk. Performance during resilver will be exceptionally poor. Exceptionally. Because of the inefficient raidz resilver code, do everything within your power to reduce IO on the system during the resilver. Of particular importance: Don't create snapshots while the system is resilvering. This will exponentially increase the resilver time. (I'm exaggerating by saying exponentially, don't take it literally. But in reality, it *is* significant.) Because you're going to be degrading your redundancy, you *really* want to ensure all the disks are good before you do any degrading. This means, don't begin your replace until after you've completed a scrub. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] BugID 6961707
In message 201008112022.o7bkmc2j028...@elvis.arl.psu.edu, John D Groenveld wr ites: I'm stumbling over BugID 6961707 on build 134. I see the bug has been stomped in build 150. Awesome! URL:http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6961707 In which build did it first arrive? Thanks, John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
I would not discount the performance issue... Depending on your workload, you might find that performance increases with ZFS on your hardware RAID in JBOD mode. Cindy On 10/07/10 06:26, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Stephan Budach I conducted a couple of tests, where I configured my raids as jbods and mapped each drive out as a seperate LUN and I couldn't notice a difference in performance in any way. Not sure if my original points were communicated clearly. Giving JBOD's to ZFS is not for the sake of performance. The reason for JBOD is reliability. Because hardware raid cannot detect or correct checksum errors. ZFS can. So it's better to skip the hardware raid and use JBOD, to enable ZFS access to each separate side of the redundant data. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On 7-Oct-10, at 1:22 AM, Stephan Budach wrote: Hi Edward, these are interesting points. I have considered a couple of them, when I started playing around with ZFS. I am not sure whether I disagree with all of your points, but I conducted a couple of tests, where I configured my raids as jbods and mapped each drive out as a seperate LUN and I couldn't notice a difference in performance in any way. The integrity issue is, however, clear cut. ZFS must manage the redundancy. ZFS just alerted you that your 'FC RAID' doesn't actually provide data integrity, you just lost the 'calculated' bet. :) --Toby I'd love to discuss this in a seperate thread, but first I will have to check the archives an Google. ;) Thanks, budy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Help - Deleting files from a large pool results in less free space!
I have a 20Tb pool on a mount point that is made up of 42 disks from an EMC SAN. We were running out of space and down to 40Gb left (loading 8Gb/day) and have not received disk for our SAN. Using df -h results in: Filesystem size used avail capacity Mounted on pool120T20T55G 100%/pool1 pool2 9.1T 8.0T 497G95%/pool2 The idea was to temporarily move a group of big directories to another zfs pool that had space available and link from the old location to the new. cp –r /pool1/000/pool2/ mv /pool1/000 /pool1/000d ln –s /pool2/000/pool1/000 rm –rf /pool1/000 Using df -h after the relocation results in: Filesystem size used avail capacity Mounted on pool120T19T15G 100%/pool1 pool2 9.1T 8.3T 221G98%/pool2 Using zpool list says: NAMESIZE USEDAVAIL CAP pool1 19.9T19.6T 333G 98% pool2 9.25T8.89T 369G 96% Using zfs get all pool1 produces: NAME PROPERTYVALUE SOURCE pool1 typefilesystem - pool1 creationTue Dec 18 11:37 2007 - pool1 used19.6T - pool1 available 15.3G - pool1 referenced 19.5T - pool1 compressratio 1.00x - pool1 mounted yes- pool1 quota none default pool1 reservation none default pool1 recordsize 128K default pool1 mountpoint /pool1 default pool1 sharenfson local pool1 checksumon default pool1 compression offdefault pool1 atime on default pool1 devices on default pool1 execon default pool1 setuid on default pool1 readonlyoffdefault pool1 zoned offdefault pool1 snapdir hidden default pool1 aclmode groupmask default pool1 aclinherit secure default pool1 canmounton default pool1 shareiscsi offdefault pool1 xattr on default pool1 replication:locked true local Has anyone experienced this or know where to look for a solution to recovering space? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help - Deleting files from a large pool results in less free space!
any snapshots? *zfs list -t snapshot* ..Remco On 10/7/10 7:24 PM, Jim Sloey wrote: I have a 20Tb pool on a mount point that is made up of 42 disks from an EMC SAN. We were running out of space and down to 40Gb left (loading 8Gb/day) and have not received disk for our SAN. Using df -h results in: Filesystem size used avail capacity Mounted on pool120T20T55G 100%/pool1 pool2 9.1T 8.0T 497G95%/pool2 The idea was to temporarily move a group of big directories to another zfs pool that had space available and link from the old location to the new. cp –r /pool1/000/pool2/ mv /pool1/000 /pool1/000d ln –s /pool2/000/pool1/000 rm –rf /pool1/000 Using df -h after the relocation results in: Filesystem size used avail capacity Mounted on pool120T19T15G 100%/pool1 pool2 9.1T 8.3T 221G98%/pool2 Using zpool list says: NAMESIZE USEDAVAIL CAP pool1 19.9T19.6T 333G 98% pool2 9.25T8.89T 369G 96% Using zfs get all pool1 produces: NAME PROPERTYVALUE SOURCE pool1 typefilesystem - pool1 creationTue Dec 18 11:37 2007 - pool1 used19.6T - pool1 available 15.3G - pool1 referenced 19.5T - pool1 compressratio 1.00x - pool1 mounted yes- pool1 quota none default pool1 reservation none default pool1 recordsize 128K default pool1 mountpoint /pool1 default pool1 sharenfson local pool1 checksumon default pool1 compression offdefault pool1 atime on default pool1 devices on default pool1 execon default pool1 setuid on default pool1 readonlyoffdefault pool1 zoned offdefault pool1 snapdir hidden default pool1 aclmode groupmask default pool1 aclinherit secure default pool1 canmounton default pool1 shareiscsi offdefault pool1 xattr on default pool1 replication:locked true local Has anyone experienced this or know where to look for a solution to recovering space? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help - Deleting files from a large pool results in less free space!
Forgive me, but isn't this incorrect: --- mv /pool1/000 /pool1/000d --- rm –rf /pool1/000 Shouldn't that last line be rm –rf /pool1/000d ?? On 8 October 2010 04:32, Remco Lengers re...@lengers.com wrote: any snapshots? *zfs list -t snapshot* ..Remco On 10/7/10 7:24 PM, Jim Sloey wrote: I have a 20Tb pool on a mount point that is made up of 42 disks from an EMC SAN. We were running out of space and down to 40Gb left (loading 8Gb/day) and have not received disk for our SAN. Using df -h results in: Filesystem size used avail capacity Mounted on pool120T20T55G 100%/pool1 pool2 9.1T 8.0T 497G95%/pool2 The idea was to temporarily move a group of big directories to another zfs pool that had space available and link from the old location to the new. cp –r /pool1/000/pool2/ mv /pool1/000 /pool1/000d ln –s /pool2/000/pool1/000 rm –rf /pool1/000 Using df -h after the relocation results in: Filesystem size used avail capacity Mounted on pool120T19T15G 100%/pool1 pool2 9.1T 8.3T 221G98%/pool2 Using zpool list says: NAMESIZE USEDAVAIL CAP pool1 19.9T19.6T 333G 98% pool2 9.25T8.89T 369G 96% Using zfs get all pool1 produces: NAME PROPERTYVALUE SOURCE pool1 typefilesystem - pool1 creationTue Dec 18 11:37 2007 - pool1 used19.6T - pool1 available 15.3G - pool1 referenced 19.5T - pool1 compressratio 1.00x - pool1 mounted yes- pool1 quota none default pool1 reservation none default pool1 recordsize 128K default pool1 mountpoint /pool1 default pool1 sharenfson local pool1 checksumon default pool1 compression offdefault pool1 atime on default pool1 devices on default pool1 execon default pool1 setuid on default pool1 readonlyoffdefault pool1 zoned offdefault pool1 snapdir hidden default pool1 aclmode groupmask default pool1 aclinherit secure default pool1 canmounton default pool1 shareiscsi offdefault pool1 xattr on default pool1 replication:locked true local Has anyone experienced this or know where to look for a solution to recovering space? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help - Deleting files from a large pool results in less free space!
Yes, you're correct. There was a typo when I copied to the forum. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help - Deleting files from a large pool results in less free space!
Yes. We run a snap in cron to a disaster recovery site. NAME USED AVAIL REFER MOUNTPOINT po...@20100930-22:20:00 13.2M - 19.5T - po...@20101001-01:20:00 4.35M - 19.5T - po...@20101001-04:20:00 0 - 19.5T - po...@20101001-07:20:00 0 - 19.5T - po...@20101001-10:20:00 1.87M - 19.5T - po...@20101001-13:20:00 2.93M - 19.5T - po...@20101001-16:20:00 4.68M - 19.5T - po...@20101001-19:20:00 5.47M - 19.5T - po...@20101001-22:20:00 3.33M - 19.5T - po...@20101002-01:20:00 4.98M - 19.5T - po...@20101002-04:20:00 298K - 19.5T - po...@20101002-07:20:00 138K - 19.5T - po...@20101002-10:20:00 1.14M - 19.5T - po...@20101002-13:20:00 228K - 19.5T - po...@20101002-16:20:00 0 - 19.5T - po...@20101002-19:20:00 0 - 19.5T - po...@20101002-22:20:01 110K - 19.5T - po...@20101003-01:20:00 1.39M - 19.5T - po...@20101003-04:20:00 3.67M - 19.5T - po...@20101003-07:20:00 540K - 19.5T - po...@20101003-10:20:00 551K - 19.5T - po...@20101003-13:20:00 640K - 19.5T - po...@20101003-16:20:00 1.72M - 19.5T - po...@20101003-19:20:00 542K - 19.5T - po...@20101003-22:20:00 0 - 19.5T - po...@20101004-01:20:00 0 - 19.5T - po...@20101004-04:20:01 102K - 19.5T - po...@20101004-07:20:00 501K - 19.5T - po...@20101004-10:20:00 2.54M - 19.5T - po...@20101004-13:20:00 5.24M - 19.5T - po...@20101004-16:20:00 4.78M - 19.5T - po...@20101004-19:20:00 3.86M - 19.5T - po...@20101004-22:20:00 4.37M - 19.5T - po...@20101005-01:20:00 7.18M - 19.5T - po...@20101005-04:20:00 0 - 19.5T - po...@20101005-07:20:00 0 - 19.5T - po...@20101005-10:20:00 2.89M - 19.5T - po...@20101005-13:20:00 8.42M - 19.5T - po...@20101005-16:20:00 12.0M - 19.5T - po...@20101005-19:20:00 4.75M - 19.5T - po...@20101005-22:20:00 2.49M - 19.5T - po...@20101006-01:20:00 3.06M - 19.5T - po...@20101006-04:20:00 244K - 19.5T - po...@20101006-07:20:00 182K - 19.5T - po...@20101006-10:20:00 3.16M - 19.5T - po...@20101006-13:20:00 177M - 19.5T - po...@20101006-16:20:00 396M - 19.5T - po...@20101006-22:20:00 282M - 19.5T - po...@20101007-10:20:00 187M - 19.5T - -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help - Deleting files from a large pool results in less free space!
One of us found the following: The presence of snapshots can cause some unexpected behavior when you attempt to free space. Typically, given appropriate permissions, you can remove a file from a full file system, and this action results in more space becoming available in the file system. However, if the file to be removed exists in a snapshot of the file system, then no space is gained from the file deletion. The blocks used by the file continue to be referenced from the snapshot. As a result, the file deletion can consume more disk space, because a new version of the directory needs to be created to reflect the new state of the namespace. This behavior means that you can get an unexpected ENOSPC or EDQUOT when attempting to remove a file. Since we are using snapshots to a remote system, what will be the impact of destroying the snapshots? Since the files we moved are some of the oldest, will we have to start replication to the remote site over again from the beginning? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bursty writes - why?
Figured it out - it was the NFS client. I used snoop and then some dtrace magic to prove that the client (which was using O_SYNC) was sending very bursty requests to the system. I tried a number of other NFS clients with O_SYNC as well and got excellent performance when they were configured correctly. Just for fun I disabled the DDRdrive X1 (pair of them) that I use for the ZIL and performance tanked across the board when using O_SYNC. I can't recommend the DDRdrive X1 enough as a ZIL! Here is a great article on this behavior here: http://blogs.sun.com/brendan/entry/slog_screenshots Thanks for the help all! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving newly created pool to alternate host
Hi Cindys, Thanks for your mail. I have some further queries here based on your answer. Once zfs split creates new pool (as per below example it is mypool_snap) can I access mypool_snap just by importing on the same host Host1 ?? what is the current access method of newly created mypool_snap ? is it read-write or read only ? If it is read-write is there a way I can make it read only so that backup application cannot misuse. Also I want (or going to use ) mypool_snap as read only on alternate host i.e. host2. Could you please let me know what all steps I need to take on host1 and then on host2 once zpool split is done. I can guess as after zpool split, mypool_snap is not visible to host1. Once needs to import explicitly. Instead of importing on same host i.e. host1 can i go to host2 where split node devices are visible and directly run zpool import mypool_snap which will be further used as read only for backup ?? Could you please provide more details. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] calling zfs snapshot on multiple file systems under same pool or dirrerent
Hi, I could able to call zfs snapshot on individual file system/ volume using zfs snapshot filesystem|volume Or I can call zfs -r snapshot filesys...@snapshot-name to take all snapshots. I there a way I can specify more than 1 file system/volume of same pool or different pool to call a single zfs snapshot ?? Ex: pool/fs1 , poo2/fs2 is there any mechanism i can call zfs snapshot pool/f...@snap1 pool2/f...@snap1 Thanks Regards, sridhar. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] [RFC] Backup solution
Hi all I'm setting up a couple of 110TB servers and I just want some feedback in case I have forgotten something. The servers (two of them) will, as of current plans, be using 11 VDEVs with 7 2TB WD Blacks each, with a couple of Crucial RealSSD 256GB SSDs for the L2ARC and another couple of 100GB OCZ Vertex 2 Pro for the SLOG (I know, it's way too much, but they will wear out slowlier and there aren't fast SSDs around that are small). There will be 48 gigs of RAM for each box on recent Xeon CPUs. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving newly created pool to alternate host
Hi Sridhar, Most of the answers to your questions are yes. If I have a mirrored pool mypool, like this: # zpool status mypool pool: mypool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 errors: No known data errors and I split it like this: # zpool split mypool mypool_snap # zpool status pool: mypool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 c3t1d0ONLINE 0 0 0 Only mypool is currently imported. The new pool mypool_snap is exported by default. So, you can either import it on host1 or host2 if the LUNs are available on both systems. # zpool import mypool_snap # zpool status mypool mypool_snap pool: mypool state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 c3t1d0ONLINE 0 0 0 errors: No known data errors pool: mypool_snap state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM mypool_snap ONLINE 0 0 0 c3t2d0ONLINE 0 0 0 errors: No known data errors In current Solaris releases, only ZFS file systems are mounted read-only. The ability to import a pool read-only is available with CR 6720531. If you import mypool_snap on host2, it will not be available or visible on host1. Thanks, Cindy On 10/07/10 15:36, sridhar surampudi wrote: Hi Cindys, Thanks for your mail. I have some further queries here based on your answer. Once zfs split creates new pool (as per below example it is mypool_snap) can I access mypool_snap just by importing on the same host Host1 ?? what is the current access method of newly created mypool_snap ? is it read-write or read only ? If it is read-write is there a way I can make it read only so that backup application cannot misuse. Also I want (or going to use ) mypool_snap as read only on alternate host i.e. host2. Could you please let me know what all steps I need to take on host1 and then on host2 once zpool split is done. I can guess as after zpool split, mypool_snap is not visible to host1. Once needs to import explicitly. Instead of importing on same host i.e. host1 can i go to host2 where split node devices are visible and directly run zpool import mypool_snap which will be further used as read only for backup ?? Could you please provide more details. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
On 10/ 8/10 10:54 AM, Roy Sigurd Karlsbakk wrote: Hi all I'm setting up a couple of 110TB servers and I just want some feedback in case I have forgotten something. The servers (two of them) will, as of current plans, be using 11 VDEVs with 7 2TB WD Blacks each, with a couple of Crucial RealSSD 256GB SSDs for the L2ARC and another couple of 100GB OCZ Vertex 2 Pro for the SLOG (I know, it's way too much, but they will wear out slowlier and there aren't fast SSDs around that are small). There will be 48 gigs of RAM for each box on recent Xeon CPUs. What configuration are you proposing for the vdevs? Don't forget you will have very long resilver times with those drives. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
- Original Message - On 10/ 8/10 10:54 AM, Roy Sigurd Karlsbakk wrote: Hi all I'm setting up a couple of 110TB servers and I just want some feedback in case I have forgotten something. The servers (two of them) will, as of current plans, be using 11 VDEVs with 7 2TB WD Blacks each, with a couple of Crucial RealSSD 256GB SSDs for the L2ARC and another couple of 100GB OCZ Vertex 2 Pro for the SLOG (I know, it's way too much, but they will wear out slowlier and there aren't fast SSDs around that are small). There will be 48 gigs of RAM for each box on recent Xeon CPUs. What configuration are you proposing for the vdevs? Don't forget you will have very long resilver times with those drives. RAIDz2 on each VDEV. I'm aware of that the resilver time will be worse than using 10k or 15k drives, but then, those 2TB drives aren't available for anything but 7k2 or less. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
On 10/ 8/10 11:06 AM, Roy Sigurd Karlsbakk wrote: - Original Message - On 10/ 8/10 10:54 AM, Roy Sigurd Karlsbakk wrote: Hi all I'm setting up a couple of 110TB servers and I just want some feedback in case I have forgotten something. The servers (two of them) will, as of current plans, be using 11 VDEVs with 7 2TB WD Blacks each, with a couple of Crucial RealSSD 256GB SSDs for the L2ARC and another couple of 100GB OCZ Vertex 2 Pro for the SLOG (I know, it's way too much, but they will wear out slowlier and there aren't fast SSDs around that are small). There will be 48 gigs of RAM for each box on recent Xeon CPUs. What configuration are you proposing for the vdevs? Don't forget you will have very long resilver times with those drives. RAIDz2 on each VDEV. I'm aware of that the resilver time will be worse than using 10k or 15k drives, but then, those 2TB drives aren't available for anything but 7k2 or less. I would seriously consider raidz3, given I typically see 80-100 hour resilver times for 500G drives in raidz2 vdevs. If you haven't already, read Adam Leventhal's paper: http://queue.acm.org/detail.cfm?id=1670144 -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 disk raidz2 vdev that took about 16 hours to resliver. There was very little IO on the array, and it had maybe 3.5T of data to resliver. On Oct 7, 2010, at 3:17 PM, Ian Collins wrote: I would seriously consider raidz3, given I typically see 80-100 hour resilver times for 500G drives in raidz2 vdevs. If you haven't already, read Adam Leventhal's paper: http://queue.acm.org/detail.cfm?id=1670144 -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Scott Meilicke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
On 10/ 8/10 11:22 AM, Scott Meilicke wrote: Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 disk raidz2 vdev that took about 16 hours to resliver. There was very little IO on the array, and it had maybe 3.5T of data to resliver. On Oct 7, 2010, at 3:17 PM, Ian Collins wrote: I would seriously consider raidz3, given I typically see 80-100 hour resilver times for 500G drives in raidz2 vdevs. Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 disk raidz2 vdev that took about 16 hours to resliver. There was very little IO on the array, and it had maybe 3.5T of data to resliver. It's is a backup staging server (a Thumper), so it's receiving a steady stream of snapshots and rsyncs (from windows). That's why it typically gets to 100% complete half way through the actual resilver! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz faulted with only one unavailable disk
Hi, I've been playing around with zfs for a few days now, and now ended up with a faulted raidz (4 disks) with 3 disks still marked as online. Lets start with the output of zpool import: pool: tank-1 id: 15108774693087697468 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-5E config: tank-1 FAULTED corrupted data raidz1-0 ONLINE disk/by-id/dm-name-tank-1-1 UNAVAIL corrupted data disk/by-id/dm-name-tank-1-2 ONLINE disk/by-id/dm-name-tank-1-3 ONLINE disk/by-id/dm-name-tank-1-4 ONLINE After some google searches and reading http://www.sun.com/msg/ZFS-8000-5E, it seems to me as if some metadata is lost, and thus the pool cannot be restored anymore. I've tried zpool import -F tank-1 as well as zpool import -f tank-1, both resulting in the following message: cannot import 'tank-1': I/O error Destroy and re-create the pool from a backup source. What I'm wondering about right now are the following things: Is there some way to recover the data? I thought raidz would require two disks to lose the data? And as I think that the data is lost - why did this happen in the first place? Which situation can cause a faulted raidz that has only one broken drive? Greetings, Christian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
From: Cindy Swearingen [mailto:cindy.swearin...@oracle.com] I would not discount the performance issue... Depending on your workload, you might find that performance increases with ZFS on your hardware RAID in JBOD mode. Depends on the raid card you're comparing to. I've certainly seen some raid cards that were too dumb to read from 2 disks in a mirror simultaneously for the sake of read performance enhancement. And many other similar situations. But I would not say that's generally true anymore. In the last several years, all the hardware raid cards that I've bothered to test were able to utilize all the hardware available. Just like ZFS. There are performance differences... like ... the hardware raid might be able to read 15% faster in raid5, while ZFS is able to write 15% faster in raidz, and so forth. Differences that roughly balance each other out. For example, here's one data point I can share (2 mirrors striped, results normalized): 8 initial writers, 8 rewriters, 8 readers ZFS 1.432.995.05 HW 2.002.542.96 8 re-readers, 8 reverse readers, 8 stride readers ZFS 4.193.593.93 HW 3.022.802.90 8 random readers, 8 random mix, 8 random writers ZFS 2.572.401.69 HW 1.991.701.73 average ZFS 3.09 HW 2.40 There were some categories where ZFS was faster. Some where HW was faster. On average, ZFS was faster, but they were all in the same ballpark, and the results were highly dependent on specific details and tunables. AKA, not a place you should explore, unless you have a highly specialized use case that you wish to optimize. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I upgrade a striped pool of vdevs to mirrored vdevs?
Hi Cindy, very well - thanks. I noticed that either the pool you're using and the zpool that is described inb the docs already show a mirror-0 configuration, which isn't the case for my zpool: zpool status obelixData pool: obelixData state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM obelixData ONLINE 0 0 0 c4t21D023038FA8d0 ONLINE 0 0 0 c4t21D02305FF42d0 ONLINE 0 0 0 errors: No known data errors Actually, this zpool consists of two FC raids and I think I created it simply by adding these two devs to the pool. Does this disqualify my zpool for upgrading? Thanks, budy Am 04.10.10 16:48, schrieb Cindy Swearingen: Hi-- Yes, you would use the zpool attach command to convert a non-redundant configuration into a mirrored pool configuration. http://docs.sun.com/app/docs/doc/819-5461/gcfhe?l=ena=view See: Example 4–6 Converting a Nonredundant ZFS Storage Pool to a Mirrored ZFS Storage Pool If you have more than one device in the pool, you would continue to attach a new disk to each existing device, like this: # zpool status test pool: test state: ONLINE scan: resilvered 85.5K in 0h0m with 0 errors on Mon Oct 4 08:44:35 2010 config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 # zpool attach test c3t1d0 c4t1d0 # zpool attach test c3t2d0 c4t2d0 # zpool attach test c3t3d0 c4t3d0 This would create a mirrored pool with 3 two-way mirrors. I would suggest attaching one disk at a time, letting it resilver and then run a scrub to ensure that each new disk is functional. Thanks, Cindy On 10/04/10 08:24, Stephan Budach wrote:/dev/dsk/c2t5d0s2 Hi, once I created a zpool of single vdevs not using mirroring of any kind. Now I wonder if it's possible to add vdevs and mirror the currently existing ones. Thanks, budy -- Stephan Budach Jung von Matt/it-services GmbH Glashüttenstraße 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.bud...@jvm.de Internet: http://www.jvm.com Geschäftsführer: Ulrich Pallas, Frank Wilhelm AG HH HRB 98380 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [mdb-discuss] onnv_142 - vfs_mountroot: cannot mount root
Hi, The issue here was using DKIOCGMEDIAINFOEXT by ZFS introduced in changeset 12208. Forcing DKIOCGMEDIAINFO solved that. On Tue, Sep 7, 2010 at 4:35 PM, Gavin Maltby gavin.mal...@oracle.com wrote: On 09/07/10 23:26, Piotr Jasiukajtis wrote: Hi, After upgrade from snv_138 to snv_142 or snv_145 I'm unable to boot the system. Here is what I get. Any idea why it's not able to import rpool? I saw this issue also on older builds on a different machines. This sounds (based on the presence of cpqary) not unlike: 6972328 Installation of snv_139+ on HP BL685c G5 fails due to panic during auto install process which was introduced into onnv_139 by the fix for this 6927876 For 4k sector support, ZFS needs to use DKIOCGMEDIAINFOEXT The fix is in onnv_148 after the external push switch-off, fixed via 6967658 sd_send_scsi_READ_CAPACITY_16() needs to handle SBC-2 and SBC-3 response formats I experienced this on data pools rather than the rpool, but I suspect on the rpool you'd get the vfs_mountroot panic you see when rpool import fails. My workaround was to compile a zfs with the fix for 6927876 changed to force the default physical block size of 512 and drop that into the BE before booting to it. There was no simpler workaround available. Gavin -- Piotr Jasiukajtis | estibi | SCA OS0072 http://estseg.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I upgrade a striped pool of vdevs to mirrored vdevs?
Hi Cindy, well, actually the two LUNs represent two different raid boxes that are conencted through a FC switch to which the host is attached too. I simply added these two FC LUNs to a pool, but from what you all are telling me, I should be good, by adding two equal LUNs as described and await the end of the resilver process, which will take a good amount of time… ;) Thanks, budy Am 04.10.10 17:48, schrieb Cindy Swearingen: To answer your other questions, I'm not sure I'm following your FC raid description: Are you saying you created two LUNs from a FC RAID array and added them to the pool? If so, then yes, you can still attach more LUNs from the array to create a mirrored pool. A best practice is to mirror across controllers for better reliability, but ZFS doesn't check if the disks to attach are from the same array, if that's what you mean. Thanks, Cindy On 10/04/10 09:05, Stephan Budach wrote: Hi Cindy, very well - thanks. I noticed that either the pool you're using and the zpool that is described inb the docs already show a mirror-0 configuration, which isn't the case for my zpool: zpool status obelixData pool: obelixData state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM obelixData ONLINE 0 0 0 c4t21D023038FA8d0 ONLINE 0 0 0 c4t21D02305FF42d0 ONLINE 0 0 0 errors: No known data errors Actually, this zpool consists of two FC raids and I think I created it simply by adding these two devs to the pool. Does this disqualify my zpool for upgrading? Thanks, budy Am 04.10.10 16:48, schrieb Cindy Swearingen: Hi-- Yes, you would use the zpool attach command to convert a non-redundant configuration into a mirrored pool configuration. http://docs.sun.com/app/docs/doc/819-5461/gcfhe?l=ena=view See: Example 4–6 Converting a Nonredundant ZFS Storage Pool to a Mirrored ZFS Storage Pool If you have more than one device in the pool, you would continue to attach a new disk to each existing device, like this: # zpool status test pool: test state: ONLINE scan: resilvered 85.5K in 0h0m with 0 errors on Mon Oct 4 08:44:35 2010 config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 # zpool attach test c3t1d0 c4t1d0 # zpool attach test c3t2d0 c4t2d0 # zpool attach test c3t3d0 c4t3d0 This would create a mirrored pool with 3 two-way mirrors. I would suggest attaching one disk at a time, letting it resilver and then run a scrub to ensure that each new disk is functional. Thanks, Cindy On 10/04/10 08:24, Stephan Budach wrote:/dev/dsk/c2t5d0s2 Hi, once I created a zpool of single vdevs not using mirroring of any kind. Now I wonder if it's possible to add vdevs and mirror the currently existing ones. Thanks, budy -- Stephan Budach Jung von Matt/it-services GmbH Glashüttenstraße 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.bud...@jvm.de Internet: http://www.jvm.com Geschäftsführer: Ulrich Pallas, Frank Wilhelm AG HH HRB 98380 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins I would seriously consider raidz3, given I typically see 80-100 hour resilver times for 500G drives in raidz2 vdevs. If you haven't already, If you're going raidz3, with 7 disks, then you might as well just make mirrors instead, and eliminate the slow resilver. Mirrors resilver enormously faster than raidzN. At least for now, until maybe one day the raidz resilver code might be rewritten. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Backup solution
On 2010-Oct-08 09:07:34 +0800, Edward Ned Harvey sh...@nedharvey.com wrote: If you're going raidz3, with 7 disks, then you might as well just make mirrors instead, and eliminate the slow resilver. There is a difference in reliability: raidzN means _any_ N disks can fail, whereas mirror means one disk in each mirror pair can fail. With a mirror, Murphy's Law says that the second disk to fail will be the pair of the first disk :-). -- Peter Jeremy pgpqLss4mZKH3.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating to an aclmode-less world
On Tue, 5 Oct 2010, Nicolas Williams wrote: Right. That only happens from NFSv3 clients [that don't instead edit the POSIX Draft ACL translated from the ZFS ACL], from non-Windows NFSv4 clients [that don't instead edit the ACL], and from local applications [that don't instead edit the ZFS ACL]. You mean the vast majority of applications in existance ;)? Other than chmod(1) in Solaris, and nfs4_(get|set)_facl in Linux, can you name off the top of your head *any* other applications that grok ZFS/NFSv4 ACLs (as opposed to blindly chmod'ing stuff and breaking your access control sigh)? (and GUI front ends to chmod/(get_set)_facl don't count :) ). I'm still waiting for the bug in Solaris chgrp that breaks ACLs to get fixed; I reported that last year sometime. And *that's* a core component of the Solaris OS itself; what's the chance of a timely response from a 3rd party vendor whose application doesn't play nicely with ACLs? broken record If only there was some way to keep applications from screwing up your ACLs with inappropriate uses of chmod... /broken record -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss