Re: [zfs-discuss] Logical Units and ZFS send / receive
> From: Richard Elling > Date: Wed, 4 Aug 2010 18:40:49 -0700 > To: Terry Hull > Cc: "zfs-discuss@opensolaris.org" > Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive > > On Aug 4, 2010, at 1:27 PM, Terry Hull wrote: >>> From: Richard Elling >>> Date: Wed, 4 Aug 2010 11:05:21 -0700 >>> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive >>> >>> On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: I have a logical unit created with sbdadm create-lu that it I replicating with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 machine. The zfs pool names are the same on both machines. The replication seems to be going correctly. However, when I try to use the LUs on the server I am replicating the data to, I have issues. Here is the scenario: The LUs are created as sparse. Here is the process I¹m going through after the snapshots are replicated to a secondary machine: >>> >>> How did you replicate? In b134, the COMSTAR metadata is placed in >>> hidden parameters in the dataset. These are not transferred via zfs send, >>> by default. This metadata includes the LU. >>> -- richard >> >> Does the -p option on the zfs send solve that problem? > > I am unaware of a "zfs send -p" option. Did you mean the -R option? > > The LU metadata is stored in the stmf_sbd_lu property. You should be able > to get/set it. > On the source machine I did a zfs get -H stmf_sbd_lu pool-name. In my case that gave me tank/iscsi/bg-man5-vmfs stmf_sbd_lu 554c4442534e555307020702 010001843000b7010100ff862005 00c01200 180009fff1030010600144f0fa354000 4c4f9edb0003 7461 6e6b2f69736373692f62672d6d616e352d766d6673002f6465762f7a766f6c2f7264736b2f74 616e6b2f69736373692f62672d6d616e352d766d667300e70100 002200ff080 local (But it was all one line.) I cut the numeric section out above and then did a Zfs set stmf_sbd_lu=(above cut section) pool_name And that seemed to work. However, when I did a stmfadm import_lu /dev/zvol/rdsk/pool I still get meta file error However, when I do a zfs get -H stmf_sbd_lu pool_name on the secondary system, it now matches the results on the first system. BTW: The zfs send -p option is described as "Send Properties" It seems like this should not be so hard to transfer an LU with zfs send/receive. >> What else is not sent >> by default? In other words, am I better off sending the metadata with the >> zfs send, or am I better off just creating the GUID once I get the data >> transferred? > > I don't think this is a GUID issue. > -- richard > > -- > Richard Elling > rich...@nexenta.com +1-760-896-4422 > Enterprise class storage for everyone > www.nexenta.com > -- Terry Hull ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Splitting root mirror to prep for re-install
> You can also use the "zpool split" command and save > yourself having to do the zfs send|zfs recv step - > all the data will be preserved. > > "zpool split rpool preserve" does essentially > everything up to and including the "zpool export > preserve" commands you listed in your original email. > Just don't try to boot off it. Gotta love OpenSolaris. Just did a test run with "zpool split -n rpool preserve", and it looks like it'd be the easiest way to go about the process. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Splitting root mirror to prep for re-install
> > > So, after rebuilding, you don't want to restore the > same OS that you're > currently running. But there are some files you'd > like to save for after > you reinstall. Why not just copy them off somewhere, > in a tarball or > something like that? It's about 200+ gigs of files. If I had a third drive, empty for all this, I'd do that in a heartbeat. > > > > Given a rpool with disks c7d0s0 and c6d0s0, I think > the following > > process will do what I need: > > > > 1. Run these commands > > > > # zpool detach rpool c6d0s0 > > # zpool create preserve c6d0s0 > > The only reason you currently have the rpool in a > slice (s0) is because > that's a requirement for booting. If you aren't > planning to boot from the > device after breaking it off the mirror ... Maybe > just use the whole device > instead of the slice. > > zpool create preserve c6d0 > > > > # zfs create export/home > > # zfs send rpool/export/home | zfs receive > preserve/home > > # zfs send (other filesystems) > > # zpool export preserve > > These are not right. It should be something more > like this: > zfs create -o readonly=on preserve/rpool_export_home > zfs snapshot rpool/export/h...@fubarsnap > zfs send rpool/export/h...@fubarsnap | zfs receive -F > preserve/rpool_export_home > > And finally > zpool export preserve > Good catch on the readonly. The snapshot wouldn't hurt either. The zfs manpage on svn_133 suggests that I could do the whole send/receive directly against the filesystems without a snapshot, but one extra step isn't going to hurt. > > > 2. Build out new host with svn_134, placing new > root pool on c6d0s0 (or > > whatever it's called on the new SATA controller) > > Um ... I assume that's just a type-o ... > Yes, install fresh. No, don't overwrite the existing > "preserve" disk. > Yeah, typo. > For that matter, why break the mirror at all? Just > install the OS again, > onto a single disk, which implicitly breaks the > mirror. Then when it's all > done, use "zpool import" to import the other half of > the mirror, which you > didn't overwrite. > I was worried about how "zpool import" would identify it. If I just detach the disk from the mirror, would it still consider itself a part of "rpool"? If so, how would ZFS handle two disks that belong to two distinct pools with the same name? > > > 3. Run zpool import against "preserve", copy over > data that should be > > migrated. > > > > 4. Rebuild the mirror by destroying the "preserve" > pool and attaching > > c7d0s0 to the rpool mirror. > > > > Am I missing anything? > > If you blow away the partition table of the 2nd disk > (as I suggested above, > but now retract) then you'll have to recreate the > partition table of the > second disk. So you only attach s0 to s0. > > After attaching, and resilvering, you'll want to > installgrub on the 2nd > disk, or else it won't be bootable after the first > disk fails. See the ZFS > Troubleshooting Guide for details. Yep. I keep forgetting about the installgrub part. And the future plan would be to use the whole disk instead of just a slice. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Logical Units and ZFS send / receive
On Aug 4, 2010, at 1:27 PM, Terry Hull wrote: >> From: Richard Elling >> Date: Wed, 4 Aug 2010 11:05:21 -0700 >> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive >> >> On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: >>> I have a logical unit created with sbdadm create-lu that it I replicating >>> with zfs send / receive between 2 build 134 hosts. The these LUs are >>> iSCSI >>> targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 >>> machine. The zfs pool names are the same on both machines. The >>> replication >>> seems to be going correctly. However, when I try to use the LUs on the >>> server I am replicating the data to, I have issues. Here is the scenario: >>> >>> The LUs are created as sparse. Here is the process I’m going through after >>> the snapshots are replicated to a secondary machine: >> >> How did you replicate? In b134, the COMSTAR metadata is placed in >> hidden parameters in the dataset. These are not transferred via zfs send, >> by default. This metadata includes the LU. >> -- richard > > Does the -p option on the zfs send solve that problem? I am unaware of a "zfs send -p" option. Did you mean the -R option? The LU metadata is stored in the stmf_sbd_lu property. You should be able to get/set it. > What else is not sent > by default? In other words, am I better off sending the metadata with the > zfs send, or am I better off just creating the GUID once I get the data > transferred? I don't think this is a GUID issue. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Splitting root mirror to prep for re-install
You can also use the "zpool split" command and save yourself having to do the zfs send|zfs recv step - all the data will be preserved. "zpool split rpool preserve" does essentially everything up to and including the "zpool export preserve" commands you listed in your original email. Just don't try to boot off it. On 4 Aug 2010, at 20:58, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Chris Josephes >> >> I have a host running svn_133 with a root mirror pool that I'd like to >> rebuild with a fresh install on new hardware; but I still have data on >> the pool that I would like to preserve. > > So, after rebuilding, you don't want to restore the same OS that you're > currently running. But there are some files you'd like to save for after > you reinstall. Why not just copy them off somewhere, in a tarball or > something like that? > > >> Given a rpool with disks c7d0s0 and c6d0s0, I think the following >> process will do what I need: >> >> 1. Run these commands >> >> # zpool detach rpool c6d0s0 >> # zpool create preserve c6d0s0 > > The only reason you currently have the rpool in a slice (s0) is because > that's a requirement for booting. If you aren't planning to boot from the > device after breaking it off the mirror ... Maybe just use the whole device > instead of the slice. > > zpool create preserve c6d0 > > >> # zfs create export/home >> # zfs send rpool/export/home | zfs receive preserve/home >> # zfs send (other filesystems) >> # zpool export preserve > > These are not right. It should be something more like this: > zfs create -o readonly=on preserve/rpool_export_home > zfs snapshot rpool/export/h...@fubarsnap > zfs send rpool/export/h...@fubarsnap | zfs receive -F > preserve/rpool_export_home > > And finally > zpool export preserve > > >> 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or >> whatever it's called on the new SATA controller) > > Um ... I assume that's just a type-o ... > Yes, install fresh. No, don't overwrite the existing "preserve" disk. > > For that matter, why break the mirror at all? Just install the OS again, > onto a single disk, which implicitly breaks the mirror. Then when it's all > done, use "zpool import" to import the other half of the mirror, which you > didn't overwrite. > > >> 3. Run zpool import against "preserve", copy over data that should be >> migrated. >> >> 4. Rebuild the mirror by destroying the "preserve" pool and attaching >> c7d0s0 to the rpool mirror. >> >> Am I missing anything? > > If you blow away the partition table of the 2nd disk (as I suggested above, > but now retract) then you'll have to recreate the partition table of the > second disk. So you only attach s0 to s0. > > After attaching, and resilvering, you'll want to installgrub on the 2nd > disk, or else it won't be bootable after the first disk fails. See the ZFS > Troubleshooting Guide for details. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Splitting root mirror to prep for re-install
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Chris Josephes > > I have a host running svn_133 with a root mirror pool that I'd like to > rebuild with a fresh install on new hardware; but I still have data on > the pool that I would like to preserve. So, after rebuilding, you don't want to restore the same OS that you're currently running. But there are some files you'd like to save for after you reinstall. Why not just copy them off somewhere, in a tarball or something like that? > Given a rpool with disks c7d0s0 and c6d0s0, I think the following > process will do what I need: > > 1. Run these commands > > # zpool detach rpool c6d0s0 > # zpool create preserve c6d0s0 The only reason you currently have the rpool in a slice (s0) is because that's a requirement for booting. If you aren't planning to boot from the device after breaking it off the mirror ... Maybe just use the whole device instead of the slice. zpool create preserve c6d0 > # zfs create export/home > # zfs send rpool/export/home | zfs receive preserve/home > # zfs send (other filesystems) > # zpool export preserve These are not right. It should be something more like this: zfs create -o readonly=on preserve/rpool_export_home zfs snapshot rpool/export/h...@fubarsnap zfs send rpool/export/h...@fubarsnap | zfs receive -F preserve/rpool_export_home And finally zpool export preserve > 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or > whatever it's called on the new SATA controller) Um ... I assume that's just a type-o ... Yes, install fresh. No, don't overwrite the existing "preserve" disk. For that matter, why break the mirror at all? Just install the OS again, onto a single disk, which implicitly breaks the mirror. Then when it's all done, use "zpool import" to import the other half of the mirror, which you didn't overwrite. > 3. Run zpool import against "preserve", copy over data that should be > migrated. > > 4. Rebuild the mirror by destroying the "preserve" pool and attaching > c7d0s0 to the rpool mirror. > > Am I missing anything? If you blow away the partition table of the 2nd disk (as I suggested above, but now retract) then you'll have to recreate the partition table of the second disk. So you only attach s0 to s0. After attaching, and resilvering, you'll want to installgrub on the 2nd disk, or else it won't be bootable after the first disk fails. See the ZFS Troubleshooting Guide for details. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Splitting root mirror to prep for re-install
I have a host running svn_133 with a root mirror pool that I'd like to rebuild with a fresh install on new hardware; but I still have data on the pool that I would like to preserve. Given a rpool with disks c7d0s0 and c6d0s0, I think the following process will do what I need: 1. Run these commands # zpool detach rpool c6d0s0 # zpool create preserve c6d0s0 # zfs create export/home # zfs send rpool/export/home | zfs receive preserve/home # zfs send (other filesystems) # zpool export preserve 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or whatever it's called on the new SATA controller) 3. Run zpool import against "preserve", copy over data that should be migrated. 4. Rebuild the mirror by destroying the "preserve" pool and attaching c7d0s0 to the rpool mirror. Am I missing anything? -- Chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 4, 2010, at 9:03 AM, Eduardo Bragatto wrote: > On Aug 4, 2010, at 12:26 AM, Richard Elling wrote: > >> The tipping point for the change in the first fit/best fit allocation >> algorithm is >> now 96%. Previously, it was 70%. Since you don't specify which OS, build, >> or zpool version, I'll assume you are on something modern. > > I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15. Then the first fit/best fit threshold is 96%. >> NB, "zdb -m" will show the pool's metaslab allocations. If there are no 100% >> free metaslabs, then it is a clue that the allocator might be working extra >> hard. > > On the first two VDEVs there are no allocations 100% free (most are nearly > full)... The two newer ones, however, do have several allocations of 128GB > each, 100% free. > > If I understand correctly in that scenario the allocator will work extra, is > that correct? Yes, and this can be measured, but... >> OK, so how long are they waiting? Try "iostat -zxCn" and look at the >> asvc_t column. This will show how the disk is performing, though it >> won't show the performance delivered by the file system to the >> application. To measure the latter, try "fsstat zfs" (assuming you are >> on a Solaris distro) > > Checking with iostat, I noticed the average wait time to be between 40ms and > 50ms for all disks. Which doesn't seem too bad. ... actually, that is pretty bad. Look for an average around 10 ms and peaks around 20ms. Solve this problem first -- the system can do a huge amount of allocations for any algorithm in 1ms. > And this is the output of fsstat: > > # fsstat zfs > new name name attr attr lookup rddir read read write write > file remov chng get setops ops ops bytes ops bytes > 3.26M 1.34M 3.22M 161M 13.4M 1.36G 9.6M 10.5M 899G 22.0M 625G zfs Unfortunately, the first line is useless, it is the summary since boot. Try adding a sample interval to see how things are moving now. > > However I did have CPU spikes at 100% where the kernel was taking all cpu > time. Again, this can be analyzed using baseline performance analysis techniques. The "prstat" command should show how CPU is being used. I'm not running Solaris 10 10/09, but IIRC, it has the ZFS enhancement where CPU time is attributed to the pool, as seen in prstat. -- richard > > I have reduced my zfs_arc_max parameter as it seemed the applications were > struggling for RAM and things are looking better now > > Thanks for your time, > Eduardo Bragatto. > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Wed, 4 Aug 2010, Eduardo Bragatto wrote: I will also start using rsync v3 to reduce the memory foot print, so I might be able to give back some RAM to ARC, and I'm thinking maybe going to 16GB RAM, as the pool is quite large and I'm sure more ARC wouldn't hurt. It is definitely a wise idea to use rsync v3. Previous versions had to recurse the whole tree on both sides (storing what was learned in memory) before doing anything. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Corrupt file without filename
Because this is a non-redundant root pool, you should still check fmdump -eV to make sure the corrupted files aren't due to some ongoing disk problems. cs On 08/04/10 13:45, valrh...@gmail.com wrote: Oooh... Good call! I scrubbed the pool twice, then it showed a real filename from an old snapshot that I had attempted to delete before (like a month ago), and gave an error, which I subsequently forgot about. I deleted the snapshot and cleaned up a few other snaphots, cleared the error, rescrubbed. And now, no more corrupt file. Nice! Love this forum... thanks so much! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Logical Units and ZFS send / receive
> From: Richard Elling > Date: Wed, 4 Aug 2010 11:05:21 -0700 > Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive > > On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: >> I have a logical unit created with sbdadm create-lu that it I replicating >> with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI >> targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 >> machine. The zfs pool names are the same on both machines. The replication >> seems to be going correctly. However, when I try to use the LUs on the >> server I am replicating the data to, I have issues. Here is the scenario: >> >> The LUs are created as sparse. Here is the process I¹m going through after >> the snapshots are replicated to a secondary machine: > > How did you replicate? In b134, the COMSTAR metadata is placed in > hidden parameters in the dataset. These are not transferred via zfs send, > by default. This metadata includes the LU. > -- richard Does the -p option on the zfs send solve that problem? What else is not sent by default? In other words, am I better off sending the metadata with the zfs send, or am I better off just creating the GUID once I get the data transferred? -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Corrupt file without filename
Oooh... Good call! I scrubbed the pool twice, then it showed a real filename from an old snapshot that I had attempted to delete before (like a month ago), and gave an error, which I subsequently forgot about. I deleted the snapshot and cleaned up a few other snaphots, cleared the error, rescrubbed. And now, no more corrupt file. Nice! Love this forum... thanks so much! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to identify user-created zfs filesystems?
You can use 'zpool history -l syspool' to show the username of the person who created the dataset. The history is in a ring buffer, so if too many pool operations have happened since the dataset was created, the information is lost. On Wed, 4 Aug 2010, Peter Taps wrote: Folks, In my application, I need to present user-created filesystems. For my test, I created a zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run "zfs list," I see a lot more entries: # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.31M 1.95G33K /volumes/mypool mypool/cifs11.12M 1.95G 1.12M /volumes/mypool/cifs1 mypool/cifs2 44K 1.95G44K /volumes/mypool/cifs2 syspool 3.58G 4.23G 35.5K legacy syspool/dump 716M 4.23G 716M - syspool/rootfs-nmu-000 1.85G 4.23G 1.36G legacy syspool/rootfs-nmu-001 53.5K 4.23G 1.15G legacy syspool/swap1.03G 5.19G 71.4M - I just need to present cifs1 and cifs2 to the user. Is there a property on the filesystem that I can use to determine user-created filesystems? Thank you in advance for your help. Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to identify user-created zfs filesystems?
Hi Peter, I don't think we have any property that determines who created the file system. Would this work instead: # zfs list -r mypool NAME USED AVAIL REFER MOUNTPOINT mypool 172K 134G33K /mypool mypool/cifs131K 134G31K /mypool/cifs1 mypool/cifs231K 134G31K /mypool/cifs2 Or, take a look at user properties, which is text that you can apply to a file system for whatever purpose you choose. Thanks, Cindy On 08/04/10 12:55, Peter Taps wrote: Folks, In my application, I need to present user-created filesystems. For my test, I created a zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run "zfs list," I see a lot more entries: # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.31M 1.95G33K /volumes/mypool mypool/cifs11.12M 1.95G 1.12M /volumes/mypool/cifs1 mypool/cifs2 44K 1.95G44K /volumes/mypool/cifs2 syspool 3.58G 4.23G 35.5K legacy syspool/dump 716M 4.23G 716M - syspool/rootfs-nmu-000 1.85G 4.23G 1.36G legacy syspool/rootfs-nmu-001 53.5K 4.23G 1.15G legacy syspool/swap1.03G 5.19G 71.4M - I just need to present cifs1 and cifs2 to the user. Is there a property on the filesystem that I can use to determine user-created filesystems? Thank you in advance for your help. Regards, Peter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 4, 2010, at 11:18 AM, Bob Friesenhahn wrote: Assuming that your impressions are correct, are you sure that your new disk drives are similar to the older ones? Are they an identical model? Design trade-offs are now often resulting in larger capacity drives with reduced performance. Yes, the disks are the same, no problems there. On Aug 4, 2010, at 2:11 PM, Bob Friesenhahn wrote: On Wed, 4 Aug 2010, Eduardo Bragatto wrote: Checking with iostat, I noticed the average wait time to be between 40ms and 50ms for all disks. Which doesn't seem too bad. Actually, this is quite high. I would not expect such long wait times except for when under extreme load such as a benchmark. If the wait times are this long under normal use, then there is something wrong. That's a backup server, I usually have 10 rsync instances running simultaneously so there's a lot of random disk access going on -- I think that explains the high average time. Also, I recently enabled graphing of the IOPS per disk (reading it using net-snmp) and I see most disks are operating near their limit -- except for some disks from the older VDEVs which is what I'm trying to address here. However I did have CPU spikes at 100% where the kernel was taking all cpu time. I have reduced my zfs_arc_max parameter as it seemed the applications were struggling for RAM and things are looking better now Odd. What type of applications are you running on this system? Are applications running on the server competing with client accesses? I noticed some of those rsync processes were using almost 1GB of RAM each and the server has only 8GB. I started seeing the server swapping a bit during the cpu spikes at 100%, so I figured it would be better to cap ARC and leave some room for the rsync processes. I will also start using rsync v3 to reduce the memory foot print, so I might be able to give back some RAM to ARC, and I'm thinking maybe going to 16GB RAM, as the pool is quite large and I'm sure more ARC wouldn't hurt. Thanks, Eduardo Bragatto. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to identify user-created zfs filesystems?
Folks, In my application, I need to present user-created filesystems. For my test, I created a zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run "zfs list," I see a lot more entries: # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 1.31M 1.95G33K /volumes/mypool mypool/cifs11.12M 1.95G 1.12M /volumes/mypool/cifs1 mypool/cifs2 44K 1.95G44K /volumes/mypool/cifs2 syspool 3.58G 4.23G 35.5K legacy syspool/dump 716M 4.23G 716M - syspool/rootfs-nmu-000 1.85G 4.23G 1.36G legacy syspool/rootfs-nmu-001 53.5K 4.23G 1.15G legacy syspool/swap1.03G 5.19G 71.4M - I just need to present cifs1 and cifs2 to the user. Is there a property on the filesystem that I can use to determine user-created filesystems? Thank you in advance for your help. Regards, Peter -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] vdev using more space
Hi, We have a server running b134. The server runs xen and uses a vdev as the storage. The xen image is running nevada 134. I took a snapshot last night to move the xen image to another server. NAME USED AVAIL REFER MOUNTPOINT vpool/host/snv_130 32.8G 11.3G 37.7G - vpool/host/snv_...@2010-03-31 3.27G - 13.8G - vpool/host/snv_...@2010-08-03 436M - 37.7G - It's also worth noting that vpool/host/snv_130 is a clone at least two other snapshots. I then did a zfs send of vpool/host/snv_...@2010-08-03 and got a 39GB file. A zfs send of vpool/host/snv_...@2010-03-31 gave a file of 15GB. I don't understand why the file is 39GB since df -h inside of the xen image drive vpool/host/snv_130 shows: Filesystem size used avail capacity Mounted on rpool/ROOT/snv_130 39G12G22G35%/ It would be nice if the zfs send file would be roughly the same size as the space used inside of xen machine. Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance Tuning
On Aug 4, 2010, at 3:22 AM, TAYYAB REHMAN wrote: > Hi, > i am working with ZFS now a days, i am facing some performance issues > from application team, as they said writes are very slow in ZFS w.r.t UFS. > Kindly send me some good reference or books links. i will be very thankful to > you. Hi Tayyab, Please start with the ZFS Best Practices Guide. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Wed, 4 Aug 2010, Eduardo Bragatto wrote: Checking with iostat, I noticed the average wait time to be between 40ms and 50ms for all disks. Which doesn't seem too bad. Actually, this is quite high. I would not expect such long wait times except for when under extreme load such as a benchmark. If the wait times are this long under normal use, then there is something wrong. However I did have CPU spikes at 100% where the kernel was taking all cpu time. I have reduced my zfs_arc_max parameter as it seemed the applications were struggling for RAM and things are looking better now Odd. What type of applications are you running on this system? Are applications running on the server competing with client accesses? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Logical Units and ZFS send / receive
On Aug 3, 2010, at 11:58 PM, Terry Hull wrote: > I have a logical unit created with sbdadm create-lu that it I replicating > with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI > targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 > machine. The zfs pool names are the same on both machines. The replication > seems to be going correctly. However, when I try to use the LUs on the > server I am replicating the data to, I have issues. Here is the scenario: > > The LUs are created as sparse. Here is the process I’m going through after > the snapshots are replicated to a secondary machine: How did you replicate? In b134, the COMSTAR metadata is placed in hidden parameters in the dataset. These are not transferred via zfs send, by default. This metadata includes the LU. -- richard > • Original machine: svccfg export -a stmf > /tmp/stmf.cfg > • Copy stmf.cfg to second machine: > • Secondary machine: svcadm disable stmf > • svccfg delete xtmf > • cd /var/svc/manifest > • svccfg import system/stmf.xml > • svcadm disable stmf > • svcadm import /tmp/stmf.cfg > > At this point stmfadm list-lu –v shows the SCSI LUs all as “unregistered” > > When I try to import the LUs I get: stmfadm: meta data error > > I am using the command: > stmfadm import-lu /dev/zvol/rdsk/pool-name > > to import the LU > > It is as if the pool does not exist. However, I can verify that the pool > does actually exist with zfs list and with zfs list –t snapshot to show the > snapshot that I replicated. > > > Any suggestions? > -- > Terry Hull > Network Resource Group, Inc. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LTFS and LTO-5 Tape Drives
On Wed, August 4, 2010 12:25, valrh...@gmail.com wrote: > Actually, no. I could care less about incrementals, and multivolume > handling. My purpose is to have occasional, long-term archival backup of > big experimental data sets. The challenge is keeping everything organized, > and readable several years later, where I only need to recall a small > subset of what's on the tape. The idea that the tape has a browseable > filesystem is therefore extremely useful in principle. > > Has anyone actually tried this with OpenSolaris? The LTFS websites I've > seen only talk about Mac and Linux support, but if it's supported on > Linux, in principle the (open-source?) drivers should be portable, no? I can understand the desire and convenience of a browsable file system, but I'd trust the long-term accessibility of the (POSIX) tar format more than most other things. Perhaps have one tape with tar, and other with this LTFS thing, so you have your bases covered (e.g., in case one tape is damaged, or if LTFS is just a buzzword/fad). I'm assuming you're referring to: http://en.wikipedia.org/wiki/Linear_Tape_File_System If Linux and Mac (which can be considered a variant of FreeBSD) are covered, then it should technically be possible to modify it to support Solaris. I'm sure the authors of the software would be interested in patches (assuming it's open-source). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Corrupt file without filename
Maybe it is a temporary file. You might try running a scrub to see if it goes away. I would also use fmdump -eV to see if this disk is having problems. Thanks, Cindy On 08/04/10 01:05, valrh...@gmail.com wrote: I have one corrupt file in my rpool, but when I run "zpool status -v", I don't get a filename, just an address. Any idea how to fix this? Here's the output: p...@dellt7500:~# zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c4t0d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: rpool/export/home/plu:<0x12491> p...@dellt7500:~# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On Aug 4, 2010, at 12:04 PM, Roch wrote: > > Ross Walker writes: >> On Aug 4, 2010, at 9:20 AM, Roch wrote: >> >>> >>> >>> Ross Asks: >>> So on that note, ZFS should disable the disks' write cache, >>> not enable them despite ZFS's COW properties because it >>> should be resilient. >>> >>> No, because ZFS builds resiliency on top of unreliable parts. it's able to >>> deal >>> with contained failures (lost state) of the disk write cache. >>> >>> It can then export LUNS that have WC enabled or >>> disabled. But if we enable the WC on the exported LUNS, then >>> the consumer of these LUNS must be able to say the same. >>> The discussion at that level then needs to focus on failure groups. >>> >>> >>> Ross also Said : >>> I asked this question earlier, but got no answer: while an >>> iSCSI target is presented WCE does it respect the flush >>> command? >>> >>> Yes. I would like to say "obviously" but it's been anything >>> but. >> >> Sorry to probe further, but can you expand on but... >> >> Just if we had a bunch of zvols exported via iSCSI to another Solaris >> box which used them to form another zpool and had WCE turned on would >> it be reliable? >> > > Nope. That's because all the iSCSI are in the same fault > domain as they share a unified back-end cache. What works, > in principle, is mirroring SCSI channels hosted on > different storage controllers (or N SCSI channels on N > controller in a raid group). > > Which is why keeping the WC set to the default, is really > better in general. Well I was actually talking about two backend Solaris storage servers serving up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I have redundancy there, but want the storage to be performant, so I want the iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush requests from the front-end NFS server. Does that make sense? Is it possible? -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LTFS and LTO-5 Tape Drives
Actually, no. I could care less about incrementals, and multivolume handling. My purpose is to have occasional, long-term archival backup of big experimental data sets. The challenge is keeping everything organized, and readable several years later, where I only need to recall a small subset of what's on the tape. The idea that the tape has a browseable filesystem is therefore extremely useful in principle. Has anyone actually tried this with OpenSolaris? The LTFS websites I've seen only talk about Mac and Linux support, but if it's supported on Linux, in principle the (open-source?) drivers should be portable, no? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshot space - miscalculation?
Are there other file systems underneath daten/backups that have snapshots? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
Ross Walker writes: > On Aug 4, 2010, at 9:20 AM, Roch wrote: > > > > > > > Ross Asks: > > So on that note, ZFS should disable the disks' write cache, > > not enable them despite ZFS's COW properties because it > > should be resilient. > > > > No, because ZFS builds resiliency on top of unreliable parts. it's able to > > deal > > with contained failures (lost state) of the disk write cache. > > > > It can then export LUNS that have WC enabled or > > disabled. But if we enable the WC on the exported LUNS, then > > the consumer of these LUNS must be able to say the same. > > The discussion at that level then needs to focus on failure groups. > > > > > > Ross also Said : > > I asked this question earlier, but got no answer: while an > > iSCSI target is presented WCE does it respect the flush > > command? > > > > Yes. I would like to say "obviously" but it's been anything > > but. > > Sorry to probe further, but can you expand on but... > > Just if we had a bunch of zvols exported via iSCSI to another Solaris > box which used them to form another zpool and had WCE turned on would > it be reliable? > Nope. That's because all the iSCSI are in the same fault domain as they share a unified back-end cache. What works, in principle, is mirroring SCSI channels hosted on different storage controllers (or N SCSI channels on N controller in a raid group). Which is why keeping the WC set to the default, is really better in general. -r > -Ross > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 4, 2010, at 12:20 AM, Khyron wrote: I notice you use the word "volume" which really isn't accurate or appropriate here. Yeah, it didn't seem right to me, but I wasn't sure about the nomenclature, thanks for clarifying. You may want to get a bit more specific and choose from the oldest datasets THEN find the smallest of those oldest datasets and send/receive it first. That way, the send/receive completes in less time, and when you delete the source dataset, you've now created more free space on the entire pool but without the risk of a single dataset exceeding your 10 TiB of workspace. That makes sense, I'll try send/receiving a few of those datasets and see how it goes. I believe I can find the ones that were created before the two new VDEVs were added, by comparing the creation time from "zfs get creation" ZFS' copy-on-write nature really wants no less than 20% free because you never update data in place; a new copy is always written to disk. Right, and my problem is that I have two VDEVs with less than 10% free at this point -- although the other two have around 50% free each. You might want to consider turning on compression on your new datasets too, especially if you have free CPU cycles to spare. I don't know how compressible your data is, but if it's fairly compressible, say lots of text, then you might get some added benefit when you copy the old data into the new datasets. Saving more space, then deleting the source dataset, should help your pool have more free space, and thus influence your writes for better I/O balancing when you do the next (and the next) dataset copies. Unfortunately the data taking most of the space it already compressed, so while I would gain some space from many text files that I also have, those are not the majority of my content, and the effort would probably not justify the small gain. Thanks Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 4, 2010, at 12:26 AM, Richard Elling wrote: The tipping point for the change in the first fit/best fit allocation algorithm is now 96%. Previously, it was 70%. Since you don't specify which OS, build, or zpool version, I'll assume you are on something modern. I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15. NB, "zdb -m" will show the pool's metaslab allocations. If there are no 100% free metaslabs, then it is a clue that the allocator might be working extra hard. On the first two VDEVs there are no allocations 100% free (most are nearly full)... The two newer ones, however, do have several allocations of 128GB each, 100% free. If I understand correctly in that scenario the allocator will work extra, is that correct? OK, so how long are they waiting? Try "iostat -zxCn" and look at the asvc_t column. This will show how the disk is performing, though it won't show the performance delivered by the file system to the application. To measure the latter, try "fsstat zfs" (assuming you are on a Solaris distro) Checking with iostat, I noticed the average wait time to be between 40ms and 50ms for all disks. Which doesn't seem too bad. And this is the output of fsstat: # fsstat zfs new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 3.26M 1.34M 3.22M 161M 13.4M 1.36G 9.6M 10.5M 899G 22.0M 625G zfs However I did have CPU spikes at 100% where the kernel was taking all cpu time. I have reduced my zfs_arc_max parameter as it seemed the applications were struggling for RAM and things are looking better now Thanks for your time, Eduardo Bragatto. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Tue, 3 Aug 2010, Eduardo Bragatto wrote: You're a funny guy. :) Let me re-phrase it: I'm sure I'm getting degradation in performance as my applications are waiting more on I/O now than they used to do (based on CPU utilization graphs I have). The impression part, is that the reason is the limited space in those two volumes -- as I said, I already experienced bad performance on zfs systems running nearly out of space before. Assuming that your impressions are correct, are you sure that your new disk drives are similar to the older ones? Are they an identical model? Design trade-offs are now often resulting in larger capacity drives with reduced performance. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On Aug 4, 2010, at 9:20 AM, Roch wrote: > > > Ross Asks: > So on that note, ZFS should disable the disks' write cache, > not enable them despite ZFS's COW properties because it > should be resilient. > > No, because ZFS builds resiliency on top of unreliable parts. it's able to > deal > with contained failures (lost state) of the disk write cache. > > It can then export LUNS that have WC enabled or > disabled. But if we enable the WC on the exported LUNS, then > the consumer of these LUNS must be able to say the same. > The discussion at that level then needs to focus on failure groups. > > > Ross also Said : > I asked this question earlier, but got no answer: while an > iSCSI target is presented WCE does it respect the flush > command? > > Yes. I would like to say "obviously" but it's been anything > but. Sorry to probe further, but can you expand on but... Just if we had a bunch of zvols exported via iSCSI to another Solaris box which used them to form another zpool and had WCE turned on would it be reliable? -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance Tuning
Hi, i am working with ZFS now a days, i am facing some performance issues from application team, as they said writes are very slow in ZFS w.r.t UFS. Kindly send me some good reference or books links. i will be very thankful to you. BR, Tayyab ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On 04/08/2010, at 2:13, Roch Bourbonnais wrote: > > Le 27 mai 2010 à 07:03, Brent Jones a écrit : > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly >> wrote: >>> I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: >>> >>> sh-4.0# zfs create rpool/iscsi >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test >>> >>> The underlying zpool is a mirror of two SATA drives. I'm connecting from a >>> Mac client with global SAN initiator software, connected via Gigabit LAN. >>> It connects fine, and I've initialiased a mac format volume on that iScsi >>> volume. >>> >>> Performance, however, is terribly slow, about 10 times slower than an SMB >>> share on the same pool. I expected it would be very similar, if not faster >>> than SMB. >>> >>> Here's my test results copying 3GB data: >>> >>> iScsi: 44m01s 1.185MB/s >>> SMB share: 4m2711.73MB/s >>> >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of >>> about 3: >>> >>> iScsi: 4m3611.34MB/s >>> SMB share: 1m4529.81MB/s >>> > > > > Not unexpected. Filesystems have readahead code to prefetch enough to cover > the latency of the read request. iSCSI only responds to the request. > Put a filesystem on top of iscsi and try again. As I indicated above, there is a mac filesystem on the iscsi volume. Matt. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
Ross Asks: So on that note, ZFS should disable the disks' write cache, not enable them despite ZFS's COW properties because it should be resilient. No, because ZFS builds resiliency on top of unreliable parts. it's able to deal with contained failures (lost state) of the disk write cache. It can then export LUNS that have WC enabled or disabled. But if we enable the WC on the exported LUNS, then the consumer of these LUNS must be able to say the same. The discussion at that level then needs to focus on failure groups. Ross also Said : I asked this question earlier, but got no answer: while an iSCSI target is presented WCE does it respect the flush command? Yes. I would like to say "obviously" but it's been anything but. -r Ross Walker writes: > On Aug 4, 2010, at 3:52 AM, Roch wrote: > > > > > Ross Walker writes: > > > >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais > >> wrote: > >> > >>> > >>> Le 27 mai 2010 à 07:03, Brent Jones a écrit : > >>> > On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > wrote: > > I've set up an iScsi volume on OpenSolaris (snv_134) with these > > commands: > > > > sh-4.0# zfs create rpool/iscsi > > sh-4.0# zfs set shareiscsi=on rpool/iscsi > > sh-4.0# zfs create -s -V 10g rpool/iscsi/test > > > > The underlying zpool is a mirror of two SATA drives. I'm connecting > > from a Mac client with global SAN initiator software, connected via > > Gigabit LAN. It connects fine, and I've initialiased a mac format > > volume on that iScsi volume. > > > > Performance, however, is terribly slow, about 10 times slower than an > > SMB share on the same pool. I expected it would be very similar, if > > not faster than SMB. > > > > Here's my test results copying 3GB data: > > > > iScsi: 44m01s 1.185MB/s > > SMB share: 4m2711.73MB/s > > > > Reading (the same 3GB) is also worse than SMB, but only by a factor of > > about 3: > > > > iScsi: 4m3611.34MB/s > > SMB share: 1m4529.81MB/s > > > >>> > >>> > >>> > >>> Not unexpected. Filesystems have readahead code to prefetch enough to > >>> cover the latency of the read request. iSCSI only responds to the > >>> request. > >>> Put a filesystem on top of iscsi and try again. > >>> > >>> For writes, iSCSI is synchronous and SMB is not. > >> > >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is > >> is simply SCSI over IP. > >> > > > > Hey Ross, > > > > Nothing to do with ZFS here, but you're right to point out > > that iSCSI is neither. It was just that in the context of > > this test (and 99+% of iSCSI usage) it will be. SMB is > > not. Thus a large discrepancy on the write test. > > > > Resilient storage, by default, should expose iSCSI channels > > with write caches disabled. > > > So on that note, ZFS should disable the disks' write cache, not enable them > despite ZFS's COW properties because it should be resilient. > > > >> It is the application using the iSCSI protocol that > > determines whether it is synchronous, issue a flush after > > write, or asynchronous, wait until target flushes. > >> > > > > True. > > > >> I think the ZFS developers didn't quite understand that > > and wanted strict guidelines like NFS has, but iSCSI doesn't > > have those, it is a lower level protocol than NFS is, so > > they forced guidelines on it and violated the standard. > >> > >> -Ross > >> > > > > Not True. > > > > > > ZFS exposes LUNS (or ZVOL) and while at first we didn't support > > DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you > > need it to be. > > I asked this question earlier, but got no answer: while an iSCSI target is > presented WCE does it respect the flush command? > > > Now in the context of iSCSI luns hosted by a resilient > > storage system, enabling write caches is to be used only in > > very specific circumstances. The situation is not symmetrical > > with WCE in disks of a JBOD since that can be setup with > > enough redundancy to deal with potential data loss. When > > using a resilient storage, you need to trust the storage for > > persistence of SCSI commands and building a resilient system > > on top of write cache enabled SCSI channels is not trivial. > > Not true, advertise WCE, support flush and tagged command queuing and the > initiator will be able to use the resilient storage appropriate for it's > needs. > > -Ross > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On Aug 4, 2010, at 3:52 AM, Roch wrote: > > Ross Walker writes: > >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais >> wrote: >> >>> >>> Le 27 mai 2010 à 07:03, Brent Jones a écrit : >>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly wrote: > I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: > > sh-4.0# zfs create rpool/iscsi > sh-4.0# zfs set shareiscsi=on rpool/iscsi > sh-4.0# zfs create -s -V 10g rpool/iscsi/test > > The underlying zpool is a mirror of two SATA drives. I'm connecting from > a Mac client with global SAN initiator software, connected via Gigabit > LAN. It connects fine, and I've initialiased a mac format volume on that > iScsi volume. > > Performance, however, is terribly slow, about 10 times slower than an SMB > share on the same pool. I expected it would be very similar, if not > faster than SMB. > > Here's my test results copying 3GB data: > > iScsi: 44m01s 1.185MB/s > SMB share: 4m2711.73MB/s > > Reading (the same 3GB) is also worse than SMB, but only by a factor of > about 3: > > iScsi: 4m3611.34MB/s > SMB share: 1m4529.81MB/s > >>> >>> >>> >>> Not unexpected. Filesystems have readahead code to prefetch enough to cover >>> the latency of the read request. iSCSI only responds to the request. >>> Put a filesystem on top of iscsi and try again. >>> >>> For writes, iSCSI is synchronous and SMB is not. >> >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is >> simply SCSI over IP. >> > > Hey Ross, > > Nothing to do with ZFS here, but you're right to point out > that iSCSI is neither. It was just that in the context of > this test (and 99+% of iSCSI usage) it will be. SMB is > not. Thus a large discrepancy on the write test. > > Resilient storage, by default, should expose iSCSI channels > with write caches disabled. So on that note, ZFS should disable the disks' write cache, not enable them despite ZFS's COW properties because it should be resilient. >> It is the application using the iSCSI protocol that > determines whether it is synchronous, issue a flush after > write, or asynchronous, wait until target flushes. >> > > True. > >> I think the ZFS developers didn't quite understand that > and wanted strict guidelines like NFS has, but iSCSI doesn't > have those, it is a lower level protocol than NFS is, so > they forced guidelines on it and violated the standard. >> >> -Ross >> > > Not True. > > > ZFS exposes LUNS (or ZVOL) and while at first we didn't support > DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you > need it to be. I asked this question earlier, but got no answer: while an iSCSI target is presented WCE does it respect the flush command? > Now in the context of iSCSI luns hosted by a resilient > storage system, enabling write caches is to be used only in > very specific circumstances. The situation is not symmetrical > with WCE in disks of a JBOD since that can be setup with > enough redundancy to deal with potential data loss. When > using a resilient storage, you need to trust the storage for > persistence of SCSI commands and building a resilient system > on top of write cache enabled SCSI channels is not trivial. Not true, advertise WCE, support flush and tagged command queuing and the initiator will be able to use the resilient storage appropriate for it's needs. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LTFS and LTO-5 Tape Drives
"valrh...@gmail.com" wrote: > Has anyone looked into the new LTFS on LTO-5 for tape backups? Any idea how > this would work with ZFS? I'm presuming ZFS send / receive are not going to > work. But it seems rather appealing to have the metadata properly with the > data, and being able to browse files directly instead of having to rely on > backup software, however nice tar may be. Has anyone used this with > OpenSolaris, or have an opinion on how this would work in practice? Thanks! What do you understand by "nice tar"? For a backup, you need reliable incrementals (you get this from star) and reliable multi-volume handling (this is what you also get from star). Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
Ross Walker writes: > On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais > wrote: > > > > > Le 27 mai 2010 à 07:03, Brent Jones a écrit : > > > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly > >> wrote: > >>> I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: > >>> > >>> sh-4.0# zfs create rpool/iscsi > >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi > >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test > >>> > >>> The underlying zpool is a mirror of two SATA drives. I'm connecting from > >>> a Mac client with global SAN initiator software, connected via Gigabit > >>> LAN. It connects fine, and I've initialiased a mac format volume on that > >>> iScsi volume. > >>> > >>> Performance, however, is terribly slow, about 10 times slower than an > >>> SMB share on the same pool. I expected it would be very similar, if not > >>> faster than SMB. > >>> > >>> Here's my test results copying 3GB data: > >>> > >>> iScsi: 44m01s 1.185MB/s > >>> SMB share: 4m2711.73MB/s > >>> > >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of > >>> about 3: > >>> > >>> iScsi: 4m3611.34MB/s > >>> SMB share: 1m4529.81MB/s > >>> > > > > > > > > Not unexpected. Filesystems have readahead code to prefetch enough to > > cover the latency of the read request. iSCSI only responds to the request. > > Put a filesystem on top of iscsi and try again. > > > > For writes, iSCSI is synchronous and SMB is not. > > It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is > simply SCSI over IP. > Hey Ross, Nothing to do with ZFS here, but you're right to point out that iSCSI is neither. It was just that in the context of this test (and 99+% of iSCSI usage) it will be. SMB is not. Thus a large discrepancy on the write test. Resilient storage, by default, should expose iSCSI channels with write caches disabled. > It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. > True. > I think the ZFS developers didn't quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn't have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. > > -Ross > Not True. ZFS exposes LUNS (or ZVOL) and while at first we didn't support DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you need it to be. Now in the context of iSCSI luns hosted by a resilient storage system, enabling write caches is to be used only in very specific circumstances. The situation is not symmetrical with WCE in disks of a JBOD since that can be setup with enough redundancy to deal with potential data loss. When using a resilient storage, you need to trust the storage for persistence of SCSI commands and building a resilient system on top of write cache enabled SCSI channels is not trivial. Then Matts points out As I indicated above, there is a mac filesystem on the iscsi volume. Matt. On the read side, single threaded performance is just very much controled by the readahead. Each filesystem will implement something different, the fact that you got 3X more throughput with SMB that with the Mac (HSFS+?) simply means that SMB had a 3X larger readahead buffer than HSFS. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Corrupt file without filename
I have one corrupt file in my rpool, but when I run "zpool status -v", I don't get a filename, just an address. Any idea how to fix this? Here's the output: p...@dellt7500:~# zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c4t0d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: rpool/export/home/plu:<0x12491> p...@dellt7500:~# -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Logical Units and ZFS send / receive
I have a logical unit created with sbdadm create-lu that it I replicating with zfs send / receive between 2 build 134 hosts. The these LUs are iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 machine. The zfs pool names are the same on both machines. The replication seems to be going correctly. However, when I try to use the LUs on the server I am replicating the data to, I have issues. Here is the scenario: The LUs are created as sparse. Here is the process I¹m going through after the snapshots are replicated to a secondary machine: * Original machine: svccfg export -a stmf > /tmp/stmf.cfg * Copy stmf.cfg to second machine: * Secondary machine: svcadm disable stmf * svccfg delete xtmf * cd /var/svc/manifest * svccfg import system/stmf.xml * svcadm disable stmf * svcadm import /tmp/stmf.cfg At this point stmfadm list-lu v shows the SCSI LUs all as ³unregistered² When I try to import the LUs I get: stmfadm: meta data error I am using the command: stmfadm import-lu /dev/zvol/rdsk/pool-name to import the LU It is as if the pool does not exist. However, I can verify that the pool does actually exist with zfs list and with zfs list t snapshot to show the snapshot that I replicated. Any suggestions? -- Terry Hull Network Resource Group, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss