[zfs-discuss] zpool import hangs
I am having trouble with a Raid-Z zpool bigtank of 5x 750GB drives that will not import. After having some trouble with this pool, I exported it and attempted a reimport only to discover this issue: I can see the pool by running zpool import, and the devices are all online however running zpool import bigtank with or without the -f simply causes the entire system to hang...keyboard and ssh are both non responsive. I am very new to Solaris and the *nix scene so your help would be greatly appreciated. I am currently running zdb -e -bcsvL bigtank to check checksums on the pool but this has yet to find anything wrong. I really need this data too! Please walk me through this one as best as you can. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Losts of small files vs fewer big files
dt == Don Turnbull dturnb...@greenplum.com writes: dt Any idea why this is? maybe prefetch? WAG, though. dt I work with Greenplum which is essentially a number of dt Postgres database instances clustered together. haha, yeah I know who you are. Too bad the open source postgres can't do that. :/ coughAFFEROcough. pgpK4PMndIbty.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
James Lever wrote: On 07/07/2009, at 8:20 PM, James Andrewartha wrote: Have you tried putting the slog on this controller, either as an SSD or regular disk? It's supported by the mega_sas driver, x86 and amd64 only. What exactly are you suggesting here? Configure one disk on this array as a dedicated ZIL? Would that improve performance any over using all disks with an internal ZIL? I was mainly thinking about using the battery-backup write cache to eliminate the NFS latency. There's not much difference between internal vs dedicated ZIL if the disks are the same and on the same controller - dedicated ZIL wins come from using SSDs and battery-backed cache. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices Is there a way to disable the write barrier in ZFS in the way you can with Linux filesystems (-o barrier=0)? Would this make any difference? http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes might help if the RAID card is still flushing to disk when ZFS asks it to even though it's safe in the battery-backed cache. -- James Andrewartha | Sysadmin Data Analysis Australia Pty Ltd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
You might wanna try one thing I just noticed - wrap the log device inside a SVM (disksuite) metadevice - makes wonders for the performance on my test server (Sun Fire X4240)... I do wonder what the downsides might be (except for having to fiddle with Disksuite again). Ie: # zpool create TEST c1t12d0 # format c1t13d0 (Create a 4GB partition 0) # metadb -f -a -c 3 c1t13d0s0 # metainit d0 1 1 c1t13d0s0 # zpool add TEST log /dev/md/dsk/d0 In my case the disks involved above are: c1t12d0 146GB 10krpm SAS disk c1t13d0 32GB Intel X25-E SLC SSD SATA disk Without the log added running 'gtar zxf emacs-22.3.tar.gz' over NFS from another server takes 1:39.2 (almost 2 minutes). With c1t15d0s0 added as log it takes 1:04.2, but with the same c1t15d0s0 added, but wrapped inside a SVM metadevice the same operation takes 10.4 seconds... 1:39 vs 0:10 is a pretty good speedup I think... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
Oh, and for completeness: If I wrap 'c1t12d0s0' inside a SVM metadevice to and use that to create the TEST zpool (without a log) I run the same test command in 36.3 seconds... Ie: # metadb -f -a -c3 c1t13d0s0 # metainit d0 1 1 c1t13d0s0 # metainit d2 1 1 c1t12d0s0 # zpool create TEST /dev/md/dsk/d2 If I then add a log to that device: # zpool add TEST log /dev/md/dsk/d0 the same test (gtar zxf emacs-22.3.tar.gz) runs in 10.1 seconds... (Ie, not much better than just using a raw disk + svm-encapsulated log). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX
Comments in line. On 7 juil. 09, at 19:36, Dai Ngo wrote: Without any tuning, the default TCP window size and send buffer size for NFS connections is around 48KB which is not very optimal for bulk transfer. However the 1.4MB/s write seems to indicate something else is seriously wrong. My sentiment as well. iSCSI performance was good, so the network connection seems to be OK (assuming it's 1GbE). Yup - I'm running at wire speed on the iSCSI connections. What is your mount options look like? Unfortunately, ESX doesn't give any controls over mount options I don't know what datastore browser does for copying file, but have you tried the vanilla 'cp' command? The datastore browser copy command is just a wrapper for cp from what I can gather. All types of copy operations to the NFS volume, even from other machines top out at this speed. The NFS/iSCSI connections are in a separate physical network so I can't easily plug anything into it for testing other mount options from another machine or OS. I'll try from another VM to see if I can't force a mount with the async option to see if that helps any. You can also try NFS performance using tmpfs, instead of ZFS, to make sure NIC, protocol stack, NFS are not the culprit. From what I can observe, it appears that the sync commands issues over the NFS stack are slowing down the process, even with a reasonable number of disks in the pool. What I was hoping for was the same behavior (albeit slightly risky) of having writes cached to RAM and then dumped out in an optimal manner to disk, as per the local behavior where you see the flush to disk operations happening on a regular cycle. I think that this would be doable with an async mount, but I can't set this on the server side where it would be used by the servers automatically. Erik erik.ableson wrote: OK - I'm at my wit's end here as I've looked everywhere to find some means of tuning NFS performance with ESX into returning something acceptable using osol 2008.11. I've eliminated everything but the NFS portion of the equation and am looking for some pointers in the right direction. Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a zpool of 7 mirror vdevs. ESX 3.5 and 4.0. Pretty much a vanilla install across the board, no additional software other than the Adaptec StorMan to manage the disks. local performance via dd - 463MB/s write, 1GB/s read (8Gb file) iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM) NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the Service Console, transfer of a 8Gb file via the datastore browser) I just found the tool latencytop which points the finger at the ZIL (tip of the hat to Lejun Zhu). Ref: http://www.infrageeks.com/zfs/nfsd.png http://www.infrageeks.com/zfs/fsflush.png. Log file: http://www.infrageeks.com/zfs/latencytop.log Now I can understand that there is a performance hit associated with this feature of ZFS for ensuring data integrity, but this drastic a difference makes no sense whatsoever. The pool is capable of handling natively (at worst) 120*7 IOPS and I'm not even seeing enough to saturate a USB thumb drive. This still doesn't answer why the read performance is so bad either. According to latencytop, the culprit would be genunix`cv_timedwait_sig rpcmod`svc From my searching it appears that there's no async setting for the osol nfsd, and ESX does not offer any mount controls to force an async connection. Other than putting in an SSD as a ZIL (which still strikes me as overkill for basic NFS services) I'm looking for any information that can bring me up to at least reasonable throughput. Would a dedicated 15K SAS drive help the situation by moving the ZIL traffic off to a dedicated device? Significantly? This is the sort of thing that I don't want to do without some reasonable assurance that it will help since you can't remove a ZIL device from a pool at the moment. Hints and tips appreciated, Erik ___ nfs-discuss mailing list nfs-disc...@opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hanging receive
On Wed, Jul 08, 2009 at 08:43:17AM +1200, Ian Collins wrote: Ian Collins wrote: Brent Jones wrote: On Fri, Jul 3, 2009 at 8:31 PM, Ian Collinsi...@ianshome.com wrote: Ian Collins wrote: I was doing an incremental send between pools, the receive side is locked up and no zfs/zpool commands work on that pool. The stacks look different from those reported in the earlier ZFS snapshot send/recv hangs X4540 servers thread. Here is the process information from scat (other commands hanging on the pool are also in cv_wait): Has anyone else seen anything like this? The box wouldn't even reboot, it had to be power cycled. It locks up on receive regularly now. I hit this too: 6826836 Fixed in 117 http://opensolaris.org/jive/thread.jspa?threadID=104852tstart=120 I don't think this is the same problem (which is why a started a new thread), a single incremental set will eventually lock the pool up, pretty much guaranteed each time. One more data point: This didn't happen when I had a single pool (stripe of mirrors) on the server. It started happening when I split the mirrors and created a second pool built from 3 8 drive raidz2 vdevs. Sending to the new pool (either locally or from another machine) causes the hangs. And here are my data points: We were running two X4500s under Nevada 112 but came across this issue on both of them. On receiving much data through a ZFS receive, they would lock up. Any zpool or zfs commands would hang and were unkillable. The only way to resolve the situation was to reboot without syncing disks. I reported this in some posts back in April (http://opensolaris.org/jive/click.jspa?searchID=2021762messageID=368524) One of them had an old enough zpool and zfs version to down/up/sidegrade to Solaris 10 u6 and so I made this change. The thumper running Solaris 10 is now mostly fine - it normally receives an hourly snapshot with no problem. The thumper unning 112 has continued to experience the issues described by Ian and others. I've just upgraded to 117 and am having even more issues - I'm unable to receive or roll back snapshots, instead I see: 506 r...@thumper1:~ cat snap | zfs receive -vF thumperpool receiving incremental stream of vlepool/m...@200906182000 into thumperp...@200906182000 cannot receive incremental stream: most recent snapshot of thumperpool does not match incremental source 511 r...@thumper1:~ zfs rollback -r thumperpool/m...@200906181800 cannot destroy 'thumperpool/m...@200906181900': dataset already exists As a result, I'm a bit scuppered. I'm going to try going back to by 112 installation instead to see if that resolves any of my issues. All of our thumpers have the following disk configuration: 4 x 11 Disk raidz2 arrays with 2 disks as hot spares in a single pool. 2 disks in a mirror for booting. When zpool locks up on the main pool, I'm still able to get a zpool status on the boot pool. I can't access any data on the pool which is locked up. Andrew -- Systems Developer e: andrew.nic...@luns.net.uk im: a.nic...@jabber.lancs.ac.uk t: +44 (0)1524 5 10147 Lancaster University Network Services is a limited company registered in England and Wales. Registered number: 4311892. Registered office: University House, Lancaster University, Lancaster, LA1 4YW signature.asc Description: Digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with mounting ZFS from USB drive
Karl Dalen wrote: I'm a new user of ZFS and I have an external USB drive which contains a ZFS pool with file system. It seems that it does not get auto mounted when I plug in the drive. I'm running osol-0811. How can I manually mount this drive? It has a pool named rpool on it. Is there any diagnostics commands that can be used to investigate the contents of the pool or repair a damaged file system ? rmformat shows that the physical name of the USB device is: /dev/rdsk/c4t0d0p0 If I try '# zpool import I get: pool: rpool id: 3765122753259138111 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool UNAVAIL newer version c4t0d0s0 ONLINE Did you try this: zpool import -f rpool someothername I think there are two reasons it won't import: 1) It was last accessed by another system (or maybe the same one but it had a different hostid at the time) so you need to use the -f flag. 2) There is probably another pool called rpool (the one you are running from), right ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hanging receive
Andrew Robert Nicols wrote: On Wed, Jul 08, 2009 at 08:43:17AM +1200, Ian Collins wrote: Ian Collins wrote: Brent Jones wrote: On Fri, Jul 3, 2009 at 8:31 PM, Ian Collinsi...@ianshome.com wrote: Ian Collins wrote: I was doing an incremental send between pools, the receive side is locked up and no zfs/zpool commands work on that pool. The stacks look different from those reported in the earlier ZFS snapshot send/recv hangs X4540 servers thread. Here is the process information from scat (other commands hanging on the pool are also in cv_wait): Has anyone else seen anything like this? The box wouldn't even reboot, it had to be power cycled. It locks up on receive regularly now. I hit this too: 6826836 Fixed in 117 http://opensolaris.org/jive/thread.jspa?threadID=104852tstart=120 I don't think this is the same problem (which is why a started a new thread), a single incremental set will eventually lock the pool up, pretty much guaranteed each time. One more data point: This didn't happen when I had a single pool (stripe of mirrors) on the server. It started happening when I split the mirrors and created a second pool built from 3 8 drive raidz2 vdevs. Sending to the new pool (either locally or from another machine) causes the hangs. And here are my data points: We were running two X4500s under Nevada 112 but came across this issue on both of them. On receiving much data through a ZFS receive, they would lock up. Any zpool or zfs commands would hang and were unkillable. The only way to resolve the situation was to reboot without syncing disks. I reported this in some posts back in April (http://opensolaris.org/jive/click.jspa?searchID=2021762messageID=368524) One of them had an old enough zpool and zfs version to down/up/sidegrade to Solaris 10 u6 and so I made this change. The thumper running Solaris 10 is now mostly fine - it normally receives an hourly snapshot with no problem. The thumper unning 112 has continued to experience the issues described by Ian and others. I've just upgraded to 117 and am having even more issues - I'm unable to receive or roll back snapshots, instead I see: 506 r...@thumper1:~ cat snap | zfs receive -vF thumperpool receiving incremental stream of vlepool/m...@200906182000 into thumperp...@200906182000 cannot receive incremental stream: most recent snapshot of thumperpool does not match incremental source 511 r...@thumper1:~ zfs rollback -r thumperpool/m...@200906181800 cannot destroy 'thumperpool/m...@200906181900': dataset already exists Thanks for the additional data Andrew. Can you do a zfs destroy of thumperpool/m...@200906181900? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hanging receive
On Wed, Jul 08, 2009 at 08:31:54PM +1200, Ian Collins wrote: Andrew Robert Nicols wrote: The thumper unning 112 has continued to experience the issues described by Ian and others. I've just upgraded to 117 and am having even more issues - I'm unable to receive or roll back snapshots, instead I see: 506 r...@thumper1:~ cat snap | zfs receive -vF thumperpool receiving incremental stream of vlepool/m...@200906182000 into thumperp...@200906182000 cannot receive incremental stream: most recent snapshot of thumperpool does not match incremental source 511 r...@thumper1:~ zfs rollback -r thumperpool/m...@200906181800 cannot destroy 'thumperpool/m...@200906181900': dataset already exists Thanks for the additional data Andrew. Can you do a zfs destroy of thumperpool/m...@200906181900? I'm afraid not: 503 r...@thumper1:~ zfs destroy thumperpool/m...@200906181900 cannot destroy 'thumperpool/m...@200906181900': dataset already exists Andrew -- Systems Developer e: andrew.nic...@luns.net.uk im: a.nic...@jabber.lancs.ac.uk t: +44 (0)1524 5 10147 Lancaster University Network Services is a limited company registered in England and Wales. Registered number: 4311892. Registered office: University House, Lancaster University, Lancaster, LA1 4YW signature.asc Description: Digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problem with mounting ZFS from USB drive
On 08.07.09 12:30, Darren J Moffat wrote: Karl Dalen wrote: I'm a new user of ZFS and I have an external USB drive which contains a ZFS pool with file system. It seems that it does not get auto mounted when I plug in the drive. I'm running osol-0811. How can I manually mount this drive? It has a pool named rpool on it. Is there any diagnostics commands that can be used to investigate the contents of the pool or repair a damaged file system ? rmformat shows that the physical name of the USB device is: /dev/rdsk/c4t0d0p0 If I try '# zpool import I get: pool: rpool id: 3765122753259138111 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool UNAVAIL newer version c4t0d0s0 ONLINE Did you try this: zpool import -f rpool someothername I think there are two reasons it won't import: 1) It was last accessed by another system (or maybe the same one but it had a different hostid at the time) so you need to use the -f flag. 2) There is probably another pool called rpool (the one you are running from), right ? I think this pool is just too modern for the system you are trying to import it on as it is UNAVAIL due to newer version. Here's an example: r...@jax # mkfile -n 64m version r...@jax # zpool create version /var/tmp/version r...@jax # zpool upgrade version This system is currently running ZFS pool version 10. Pool 'version' is already formatted using the current version. r...@jax # r...@jax # rcp version theorem:/var/tmp On the other host: r...@theorem # zpool upgrade This system is currently running ZFS version 4. All pools are formatted using this version. r...@theorem # zpool import -d /var/tmp pool: version id: 2589325003567752919 state: FAULTED status: The pool is formatted using an incompatible version. action: The pool cannot be imported. Access the pool on a system running newer software, or recreate the pool from backup. see: http://www.sun.com/msg/ZFS-8000-A5 config: version UNAVAIL newer version /var/tmp/version ONLINE r...@theorem # There's a difference in messages though, but older host in my case is running Solaris 10 U4, so it may explain it. Anyway, I think pool version is the real reason here. Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hanging receive
On Wed, Jul 08, 2009 at 09:41:12AM +0100, Andrew Robert Nicols wrote: On Wed, Jul 08, 2009 at 08:31:54PM +1200, Ian Collins wrote: Andrew Robert Nicols wrote: The thumper unning 112 has continued to experience the issues described by Ian and others. I've just upgraded to 117 and am having even more issues - I'm unable to receive or roll back snapshots, instead I see: 506 r...@thumper1:~ cat snap | zfs receive -vF thumperpool receiving incremental stream of vlepool/m...@200906182000 into thumperp...@200906182000 cannot receive incremental stream: most recent snapshot of thumperpool does not match incremental source 511 r...@thumper1:~ zfs rollback -r thumperpool/m...@200906181800 cannot destroy 'thumperpool/m...@200906181900': dataset already exists Thanks for the additional data Andrew. Can you do a zfs destroy of thumperpool/m...@200906181900? I'm afraid not: 503 r...@thumper1:~ zfs destroy thumperpool/m...@200906181900 cannot destroy 'thumperpool/m...@200906181900': dataset already exists Moving back to Nevada 112, I'm once again able to receive snapshots and destroy datasets as appropriate - thank goodness! However, I'm fairly sure that in a few hours, with the volume of data I'm sending I'll see zfs hang. Can anyone on the list suggest some diagnostics which may be of use when this happens? Thanks in advance, Andrew Nicols -- Systems Developer e: andrew.nic...@luns.net.uk im: a.nic...@jabber.lancs.ac.uk t: +44 (0)1524 5 10147 Lancaster University Network Services is a limited company registered in England and Wales. Registered number: 4311892. Registered office: University House, Lancaster University, Lancaster, LA1 4YW signature.asc Description: Digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX
erik.ableson writes: Comments in line. On 7 juil. 09, at 19:36, Dai Ngo wrote: Without any tuning, the default TCP window size and send buffer size for NFS connections is around 48KB which is not very optimal for bulk transfer. However the 1.4MB/s write seems to indicate something else is seriously wrong. My sentiment as well. iSCSI performance was good, so the network connection seems to be OK (assuming it's 1GbE). Yup - I'm running at wire speed on the iSCSI connections. What is your mount options look like? Unfortunately, ESX doesn't give any controls over mount options I don't know what datastore browser does for copying file, but have you tried the vanilla 'cp' command? The datastore browser copy command is just a wrapper for cp from what I can gather. All types of copy operations to the NFS volume, even from other machines top out at this speed. The NFS/iSCSI connections are in a separate physical network so I can't easily plug anything into it for testing other mount options from another machine or OS. I'll try from another VM to see if I can't force a mount with the async option to see if that helps any. You can also try NFS performance using tmpfs, instead of ZFS, to make sure NIC, protocol stack, NFS are not the culprit. From what I can observe, it appears that the sync commands issues over the NFS stack are slowing down the process, even with a reasonable number of disks in the pool. What I was hoping for was the same behavior (albeit slightly risky) of having writes cached to RAM and then dumped out in an optimal manner to disk, as per the local behavior where you see the flush to disk operations happening on a regular cycle. I think that this would be doable with an async mount, but I can't set this on the server side where it would be used by the servers automatically. Erik I would wouldn't do this, sounds like you want to have zil_disable. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide If you do, then be prepared to unmount or reboot all clients of the server in case of a crash in order to clear their corrupted caches. This is in no way a ZIL problem nor a ZFS problem. http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine And most NFS appliance provider will use a form of write accelerating devices to try to make the NFS experience closer to local filesystem behavior. -r erik.ableson wrote: OK - I'm at my wit's end here as I've looked everywhere to find some means of tuning NFS performance with ESX into returning something acceptable using osol 2008.11. I've eliminated everything but the NFS portion of the equation and am looking for some pointers in the right direction. Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a zpool of 7 mirror vdevs. ESX 3.5 and 4.0. Pretty much a vanilla install across the board, no additional software other than the Adaptec StorMan to manage the disks. local performance via dd - 463MB/s write, 1GB/s read (8Gb file) iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM) NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the Service Console, transfer of a 8Gb file via the datastore browser) I just found the tool latencytop which points the finger at the ZIL (tip of the hat to Lejun Zhu). Ref: http://www.infrageeks.com/zfs/nfsd.png http://www.infrageeks.com/zfs/fsflush.png. Log file: http://www.infrageeks.com/zfs/latencytop.log Now I can understand that there is a performance hit associated with this feature of ZFS for ensuring data integrity, but this drastic a difference makes no sense whatsoever. The pool is capable of handling natively (at worst) 120*7 IOPS and I'm not even seeing enough to saturate a USB thumb drive. This still doesn't answer why the read performance is so bad either. According to latencytop, the culprit would be genunix`cv_timedwait_sig rpcmod`svc From my searching it appears that there's no async setting for the osol nfsd, and ESX does not offer any mount controls to force an async connection. Other than putting in an SSD as a ZIL (which still strikes me as overkill for basic NFS services) I'm looking for any information that can bring me up to at least reasonable throughput. Would a dedicated 15K SAS drive help the situation by moving the ZIL traffic off to a dedicated device? Significantly? This is the sort of thing that I don't want to do without some reasonable assurance that it will help since you can't remove a ZIL device from a pool at the moment. Hints and tips appreciated, Erik ___ nfs-discuss mailing list
Re: [zfs-discuss] NFS load balancing / was: ZFS, ESX , and NFS. oh my!
Hi Miles and All, this is off-topic, but as the discussion has started here: Finally, *ALL THIS IS COMPLETELY USELESS FOR NFS* because L4 hashing can only split up separate TCP flows. The reason why I have spend some time with http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6817942 is to make NFS loadbalancing over more than one TCP stream work again. When rpcmod:clnt_max_conns is set to a value 1, the NFS client will use multiple TCP connections. Now the next question is which IP adresses and TCP ports are chosen for these connections, which are not guaranteed to be successive in order to get optimal load distribution with the hashes I've seen in the field. That's a topic I'll probably revisit.. Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
G'day, I'm putting together a LAN server with a couple of terabyte HDDs as a mirror (zfs root) on b117 (updated 2009.06). I want to back up snapshots of all of rpool to a removable drive on a USB port - simple cheap backup media for a two week rolling DR solution - ie: once a week a HDD gets swapped out and kept offsite. I figure ZFS snapshots are perfect for local backups of files, it's only DR that we need the offsite backup for. I created and formatted one drive on the USB interface (hopefully this will cope with drives being swapped in and out?), called it 'backup' to confuse things :) zfs list shows : NAME USED AVAIL REFER MOUNTPOINT backup 114K 913G21K /backup rpool 16.1G 897G84K /rpool rpool/ROOT13.7G 897G19K legacy rpool/ROOT/opensolaris37.7M 897G 5.02G / rpool/ROOT/opensolaris-1 13.7G 897G 10.9G / rpool/cashmore 140K 897G22K /rpool/cashmore rpool/dump1018M 897G 1018M - rpool/export 270M 897G23K /export rpool/export/home 270M 897G 736K /export/home rpool/export/home/carl 267M 897G 166M /export/home/carl rpool/swap1.09G 898G 101M - I've tried this : zfs snapshot -r rp...@mmddhh zfs send rp...@mmddhh | zfs receive -F backup/data eg : c...@lan2:/backup# zfs snapshot -r rp...@2009070804 c...@lan2:/backup# zfs send rp...@2009070804 | zfs receive -F backup/data Now I'd expect to see the drive light up and to see some activity, but not much seems to happen. zpool status shows : # zpool status pool: backup state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM backup ONLINE 0 0 0 c10t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirrorONLINE 0 0 0 c8d1s0 ONLINE 0 0 0 c9d1s0 ONLINE 0 0 0 errors: No known data errors and zfs list -t all : zfs list -t all | grep back backup 238K 913G 23K /backup bac...@zfs-auto-snap:frequent-2009-07-08-23:30 18K - 21K - backup/data 84K 913G 84K /backup/data backup/d...@20090708040 - 84K - So nothing much is getting copied onto the USB drive as far as I can tell. Certainly not a few GB of stuff. Can anyone tell me what I've missed or misunderstood? Does snapshot -r not get all of rpool? Thankyou! Carl -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
First of all, as other posters stressed, your data is not safe by being stored in a single copy, in the first place. Before doing anything to it, make a backup and test the backup if anyhow possible. At least, do it to any data that is more worth than the rest of it ;) As it was stressed in other posts, and not replied - how much disk space do you actually have available on your 8 disks? I.e. can you copy over some files in WinXP over to some disks in order to free up at least one drive? Half of the drives (ideal)? How much compressable is your data (i.e. videos vs text files)? Is it compressed on the NTFS filesystem already (pointer to freeing up space - if not)? Depending on how much free space can be actually gained by moving and compressing your data, there's a number of scenarios possible, detailed below. The point I'm trying to get to is: as soon as you can free up a single drive so you can move it into the Solaris machine, you can set it up as your ZFS pool. Initially this pool would only contain a single vdev (a single drive, a mirror or a raidz group of drives, which may be concatenated to make up the larger pool if there's more than one vdev, as detailed below). You create a filesystem dataset on the pool and enable compression (if your data can be compressed). In recent Solaris and OpenSolaris you can use gzip-9 to fit the info tighter on the drive. Also keep in mind that this setting applies to any data written *after* the value is set. So a dataset can store data objects written with mixed compression levels, if the value is changed on the fly. Alternatively, and more simple to support, you can make several datasets with pre-defined compression evels (i.e. not to waste CPU cycles to compress JPEGs and MP3's). Now, as you copy the data from NTFS to Solaris, you (are expected to) free up at least one more drive which can be added to the ZFS pool. Its capacity is at this moment concatenated to the same pool. If you free up many drives at once, you can go for a raidz vdev. Best-case scenario is that you free up enough disks to build a redundant ZFS volume right away (raidz1, raidz2 or mirror - as the redundant pool's capacty decreases and data protection grows). Apparently, you don't expect to have enough drives to mirror all data, so let's skip that idea for now. The raidz levels require that you free up at least two drives initially. AFAIK the raidz vdevs can not be expanded at the moment, so the more drives you're initially using - the less overhead capacity you'll lose. As you progress with data copying, you can free up some more drives and make another raidz vdev, attached to this pool. You can use a trick to make a raidz vdev with missing redundancy disks (which you'd attach and resilver later on). This is possible, but not production ready in any manner, and prone to data loss of the whole set of several drives whenever anything goes wrong. To my sole risk, I used it to make and populate a raidz2 pool of 4 devices while I only had 2 available drives at that moment (the other 2 were the old raid10 mirror's components with original data). The fake raidz redundancy devices trick is discussed in this thread: [http://opensolaris.org/jive/thread.jspa?messageID=328360tstart=0] In a worst-case scenario you'll have a either single pool of concatenated disks, or a number of separate pools - like your separate NTFS systems are now; in my opinion, this is the better of two worses. In case of separate ZFS pools, you can move them around and you only lose one disk worth of data if anything (drive, cabling, software, power) goes wrong. With a concatenated pool, however, you have all of the drives' free space also concatenated as one big available bubble. That's your choice to make. Later on you can expand the single drive vdevs to become mirrors, as you buy or free up drives. If you find that your data compresses well, so that you start with a single-drive concatenation pool and then find that you can free up several drives at once and use raidz sets, see if you can squeeze out at least 3-4 drives (including a fake device for raidz redundancy if you choose to try the trick). If you can - start a new pool made with raidz vdevs and migrate the data from single drives to it, then scrap their pool and reuse them. Remember that you can't currently remove a vdev from the pool. For such temporary pools (preferably redundant, or not) you can also use any number of older-smaller drives if you can get hands on them ;) On a side note, copying this much data over LAN would take ages. If your disks are not too much fragmented, you can typically expect 20-40Mb/s for large files. Zillions of small files (or heavily fragmented disks) make up so many mechanical seeks that speeds can fall down to well under 1Mb/s. Easy to see that copying a single 1.5Tb drive can take anywhere from half a day on a gigabit LAN, and about 2-3 days on a 100Mbit LAN (7-10
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
Carl Brewer wrote: G'day, I'm putting together a LAN server with a couple of terabyte HDDs as a mirror (zfs root) on b117 (updated 2009.06). I want to back up snapshots of all of rpool to a removable drive on a USB port - simple cheap backup media for a two week rolling DR solution - ie: once a week a HDD gets swapped out and kept offsite. I figure ZFS snapshots are perfect for local backups of files, it's only DR that we need the offsite backup for. I created and formatted one drive on the USB interface (hopefully this will cope with drives being swapped in and out?), called it 'backup' to confuse things :) zfs list shows : NAME USED AVAIL REFER MOUNTPOINT backup 114K 913G21K /backup rpool 16.1G 897G84K /rpool rpool/ROOT13.7G 897G19K legacy rpool/ROOT/opensolaris37.7M 897G 5.02G / rpool/ROOT/opensolaris-1 13.7G 897G 10.9G / rpool/cashmore 140K 897G22K /rpool/cashmore rpool/dump1018M 897G 1018M - rpool/export 270M 897G23K /export rpool/export/home 270M 897G 736K /export/home rpool/export/home/carl 267M 897G 166M /export/home/carl rpool/swap1.09G 898G 101M - I've tried this : zfs snapshot -r rp...@mmddhh zfs send rp...@mmddhh | zfs receive -F backup/data eg : c...@lan2:/backup# zfs snapshot -r rp...@2009070804 c...@lan2:/backup# zfs send rp...@2009070804 | zfs receive -F backup/data You are missing a -R for the 'zfs send' part. What you have done there is create snapshots of all the datasets in rpool called 2009070804 but you only sent the one of the top level rpool dataset. -R Generate a replication stream package, which will replicate the specified filesystem, and all descen- dant file systems, up to the named snapshot. When received, all properties, snapshots, descendent file systems, and clones are preserved. If the -i or -I flags are used in conjunction with the -R flag, an incremental replication stream is generated. The current values of properties, and current snapshot and file system names are set when the stream is received. If the -F flag is specified when this stream is recieved, snapshots and file systems that do not exist on the sending side are destroyed. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recover data after zpool create
Kees, can you provide an example of how to read from dd cylinder by cylinder? also if a file is fragmented is there a marker at the end of the first piece telling where is the second? Thank you stephen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Very slow ZFS write speed to raw zvol
Guys, Have an opensolairs x86 box running: SunOS thsudfile01 5.11 snv_111b i86pc i386 i86pc Solaris This has 2 old qla2200 1Gbit FC cards attached. Each bus is connected to an old transtec F/C raid array. This has a couple of large luns that form a single large zpool: r...@thsudfile01:~# zpool status bucket pool: bucket state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM bucket ONLINE 0 0 0 c5t0d0ONLINE 0 0 0 c8t3d0ONLINE 0 0 0 errors: No known data errors r...@thsudfile01:~# zfs list bucket NAME USED AVAIL REFER MOUNTPOINT bucket 2.69T 5.31T22K /bucket This is being used as an iSCSI target for an ESX 4.0 development environemnt. I found the performance to be really poor and found the culprit seems to be write performance to the raw zvol. For example on this zfs filesystem allocated as a volume: r...@thsudfile01:~# zfs list bucket/iSCSI/lun1 NAMEUSED AVAIL REFER MOUNTPOINT bucket/iSCSI/lun1 250G 5.55T 3.64G - r...@thsudfile01:~# dd if=/dev/zero of=/dev/zvol/rdsk/bucket/iSCSI/lun1 bs=65536 count=102400 ^C7729+0 records in 7729+0 records out 506527744 bytes (507 MB) copied, 241.707 s, 2.1 MB/ Some zpool iostat 1 1000: bucket 2.44T 5.68T 0203 0 2.73M bucket 2.44T 5.68T 0216 0 2.83M bucket 2.44T 5.68T 0120 63.4K 1.58M bucket 2.44T 5.68T 2350 190K 16.9M bucket 2.44T 5.68T 0123 0 1.64M bucket 2.44T 5.68T 0230 0 3.02M Read performance from that zvol (assuming /dev/null behaves properly) is fine: r...@thsudfile01:/bucket/transtec# dd of=/dev/null if=/dev/zvol/rdsk/bucket/iSCSI/lun1 bs=65536 count=204800 204800+0 records in 204800+0 records out 13421772800 bytes (13 GB) copied, 47.0256 s, 285 MB/s Somewhat optimistic that... but iostat shows 100MB/s ish. Write to a zfs filesystem from that zpool is also fine, here with a a write big enough to exhaust the machines 12GB memory: r...@thsudfile01:/bucket/transtec# dd if=/dev/zero of=FILE bs=65536 count=409600 ^C 336645+0 records in 336645+0 records out 22062366720 bytes (22 GB) copied, 176.369 s, 125 MB/s and bursts of cache flush from iostat: bucket 2.44T 5.68T 0342 0 38.7M bucket 2.44T 5.68T 0 1.47K 0 188M bucket 2.44T 5.68T 0240 0 21.3M bucket 2.44T 5.68T 0 1.54K 0 191M bucket 2.44T 5.68T 0 1.49K 0 191M bucket 2.44T 5.68T 0434 0 44.2M So we seem to be able to get data down to disk via the cache at a reasonable rate and read from a raw zvol OK, but writes are horribly slow. Am I missing something obvious? let me know what info would be diagnostic and I'll post it... Cheers, Leon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
I meant to add that due to the sheer amount of data (and time needed) to copy, you really don't want to use copying tools which abort on error, such as MS Explorer. Normally I'd suggest something like FAR in Windows or Midnight Commander in Unix to copy over networked connections (CIFS shares), or further on - tar/cpio/whatever. These would let you know of errors and/or suggest that you retry copying (if errors were due to environment like LAN switch reset). However, interactive tools would stall until you answer, and non-interactive tools would not continue copying over what they lost on the first pass. Overall from my experience, I'd suggest RSync running in a loop with partial-dir enabled, for either local copying or over-the-net copying. This way rsync takes care of copying only the changed files (or continuing files which failed from the point where they failed), and it does so without requiring supervision. For Windows side you can look for a project called cwRsync which includes parts of Cygwin to make the environment for rsync (ssh, openssl, etc). My typical runs between Unix hosts look like: solaris# cd /pool/dumpstore/databases solaris# while ! rsync -vaP --stats --exclude='*.bak' --exclude='temp' --partial --append source:/DUMP/snapshots/mysql . ; do sleep 5; echo = `date`: RETRY; done; date (Slashes in the end of pathnames do matter a lot - directory or its contents) For Windows the basic syntax remains nearly the same, I don't want to add confusion by crafting it out of my head now with nowhere to test. If your setup is in a LAN and security overhead can be disregarded, use 'rsync -e rsh' (or use ssh with lightweight algorithms) to not waste CPU on encryption. Alternatively, you can configure the Solaris host to act as an rsync server and use the rsync algorithm (with desired settings) directly. Also, if your files are not ASCII-named, you might want to look at rsync --iconv parameter to recode pathnames. And remember about ZFS 255-byte(!) limit on names. For Unicode names the string character length is roughly half that. //HTH, Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
On Wed, July 8, 2009 11:55, Jim Klimov wrote: My typical runs between Unix hosts look like: solaris# cd /pool/dumpstore/databases solaris# while ! rsync -vaP --stats --exclude='*.bak' --exclude='temp' --partial --append source:/DUMP/snapshots/mysql . ; do sleep 5; echo = `date`: RETRY; done; date If possible, also try to use rsync 3.x if you're going to go down that route. In previous versiosn it was necessary to traverse the entire file system to get a file list before starting a transfer. Starting with 3.0.0 (and when talking to another 3.x), it will send incremental updates so bits start moving quicker: ENHANCEMENTS: - A new incremental-recursion algorithm is now used when rsync is talking to another 3.x version. This starts the transfer going more quickly (before all the files have been found), and requires much less memory. See the --recursive option in the manpage for some restrictions. http://www.samba.org/ftp/rsync/src/rsync-3.0.0-NEWS ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recover data after zpool create
stephen bond wrote: can you provide an example of how to read from dd cylinder by cylinder? What's a cylinder? That's a meaningless term these days. You dd byte ranges. Pick whatever byte range you want. If you want mythical cylinders, fetch the cylinder size from format and use that as your block size for dd. But the disks all lie about that, and remap sectors anyway, so I don't see why you would possibly care... -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Poor Man's Cluster using zpool export and zpool import
Is it supported to use zpool export and zpool import to manage disk access between two nodes that have access to the same storage device. What issues exist if the host currently owning the zpool goes down? In this case will using zpool import -f work? Is there possible data corruption issues? Thanks, Shawn -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor Man's Cluster using zpool export and zpool import
Shawn Joy wrote: Is it supported to use zpool export and zpool import to manage disk access between two nodes that have access to the same storage device. What issues exist if the host currently owning the zpool goes down? In this case will using zpool import -f work? Is there possible data corruption issues? See the description of the cachefile property in the zpool(1M) man page, that was put there for this type of export/import clustering. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recover from zfs
(in the spirit of open source, directed back to the list) On Wed, 8 Jul 2009 14:51:55 + (GMT), Stephen C. Bond wrote: Kees, can you provide an example of how to read from dd cylinder by cylinder or even better by exact coordinates? That's hard to do, many disks don't tell you the real geometry. dd if=/dev/rdsk/cXtYdZsB of=output_file_name\ bs=block_size \ skip=nr_of_blocks_to_skip \ count=nr_of_blocks_to_copy also if a file is fragmented is there a marker at the end of the first piece telling where is the second? No. That kind of information is kept in the zfs administrative blocks. You'll have to study the on-disk format to get that kind of info. http://opensolaris.org/os/community/zfs/docs/ondiskformatfinal.pdf http://blogs.sun.com/storage/en_US/entry/examining_zfs_on_disk_format The zdb utility (zfs debugging tool) might be of help as well. Thank you Stephen C. Bond -- ( Kees Nuyt ) c[_] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor Man's Cluster using zpool export and zpool import
Hi Shawn, I have no experience with this configuration, but you might review the information in this blog: http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end ZFS is not a cluster file system and yes, possible data corruption issues exist. Eric mentions this in his blog. You might also check out the HA-cluster product: http://opensolaris.org/os/community/ha-clusters/ Cindy Shawn Joy wrote: Is it supported to use zpool export and zpool import to manage disk access between two nodes that have access to the same storage device. What issues exist if the host currently owning the zpool goes down? In this case will using zpool import -f work? Is there possible data corruption issues? Thanks, Shawn ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single disk parity
On Wed, 8 Jul 2009, Moore, Joe wrote: The copies code is nice because it tries to put each copy far away from the others. This does have a significant performance impact when on a single spindle, however, because each logical write will be written here and then a disk seek to write it to there. That's true for the worst case, but zfs mitigates that somewhat by batching i/o into a transaction group. This means that i/o is done every 30 seconds (or 5 seconds, depending on the version you're running), allowing multiple writes to be written together in the disparate locations. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
I was all ready to write about my frustrations with this problem, but I upgraded to snv_117 last night to fix some iscsi bugs and now it seems that the write throttling is working as described in that blog. I may have been a little premature. While everything is much improved for Samba and local disk operations (dd, cp) on snv_117, Comstar ISCSI writes still seem to incur this write a bit, block, write a bit, block every 5 seconds. But on top of that, I am getting relatively poor ISCSI performance for some reason with a direct gigabit link with MTU=9000. I'm not sure what that is about yet. -John -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] surprisingly poor performance
pe == Peter Eriksson no-re...@opensolaris.org writes: pe With c1t15d0s0 added as log it takes 1:04.2, but with the same pe c1t15d0s0 added, but wrapped inside a SVM metadevice the same pe operation takes 10.4 seconds... so now SVM discards cache flushes, too? great. pgpFnpp1mdyTO.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor Man's Cluster using zpool export and zpool import
Thanks Cindy and Darren -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Hello Lori, It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. Also, can you share the CR # that you mentioned in your previous email (below), so I can read further into this? thanks again, Jerry Kemp Lori Alt wrote: Latest is that this will go into an early build of Update 8 and be available as a patch shortly thereafter (shortly after it's putback, that is. The patch doesn't have to wait for U8 to be released.) I will update the CR with this information. Lori On 02/18/09 09:12, Jerry K wrote: Hello Lori, Any update to this issue, and can you speculate as to if it will be a patch to Solaris 10u6, or part of 10u7? Thanks again, Jerry Lori Alt wrote: This is in the process of being resolved right now. Stay tuned for when it will be available. It might be a patch to Update 6. In the meantime, you might try this: http://blogs.sun.com/scottdickson/entry/flashless_system_cloning_with_zfs - Lori On 01/09/09 12:28, Jerry K wrote: I understand that currently, at least under Solaris 10u6, it is not possible to jumpstart a new system with a zfs root using a flash archive as a source. Can anyone comment as to whether this restriction will pass in the near term, or if this is a while out (6+ months) before this will be possible? Thanks, Jerry ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
On Wed, 8 Jul 2009, Jerry K wrote: It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. I saw that a Solaris 10 patch for supporting Flash archives on ZFS came out about a week ago. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Any idea what the Patch ID was? fpsm On Wed, Jul 8, 2009 at 3:43 PM, Bob Friesenhahnbfrie...@simple.dallas.tx.us wrote: On Wed, 8 Jul 2009, Jerry K wrote: It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. I saw that a Solaris 10 patch for supporting Flash archives on ZFS came out about a week ago. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
On 07/08/09 13:43, Bob Friesenhahn wrote: On Wed, 8 Jul 2009, Jerry K wrote: It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. I saw that a Solaris 10 patch for supporting Flash archives on ZFS came out about a week ago. Correct. These are the patches: sparc: 119534-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124630-26: updates to the install software x86: 119535-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124631-27: updates to the install software Lori ---BeginMessage--- I received the following message about the patches for zfs flash archive support: The submitted patch has been received as release ready by raid.central and will be officially released to the Enterprise Services patch databases within 24 - 48 hours (except on weekends or holidays) or submitter will be further notified of any issues that prevent SunService from releasing it. Contact patch-mana...@sun.com if there are any further questions. The patches are: sparc: 119534-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124630-26: updates to the install software x86: 119535-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124631-27: updates to the install software A couple weeks ago, I sent out a mail about the content of these patches and about how they should be applied. I have included that message again, below. Lori --- I have two pieces of information to convey with this mail. The first is a summary of how flash archives work with zfs as a result of this patch. During the discussions of what should be implemented, there was some disagreement about what was needed. I want to summarize what finally got implemented, just so there is no confusion. Second, I want to bring everyone up to date on the state of the patch for zfs flash archive support. Overview of ZFS Flash Archive Functionality - With this new functionality, it is possible to - generate flash archives that can be used to install systems to boot off of ZFS root pools - perform Jumpstart initial installations of entire systems using these zfs-type flash archives - the flash archive backs up an entire root pool, not individual boot environments. Individual datasets within the pool can be excluded using a new -D option to flarcreate and flar. Here are the limitations: - Jumpstart installations only. No interactive install support for flash archive installs of zfs-rooted systems. No installation of individual boot environments using Live Upgrade. - Full initial install only. No differential flash installs. - No hybrid ufs/zfs archives. Existing (ufs-type) flash archives can still only be used to install ufs roots. The new zfs-type flash archive can only be used to install zfs-rooted systems. - Although the entire root pool (minus any explicitly excluded datasets) is archived and installed, only the BE booted at the time of the flarcreate will be usable after the flash archive is installed. (except for pools archived with the -R rootdir option,which can be used to archive a root pool other than the one currently booted). - The options to flarcreate and flar to include and exclude individual files in a flash archive is not support with zfs-type flash archives. Only entire datasets may be excluded from a zfs flar. - The new pool created by the flash archive install will have the same name as the pool that was used to generate the flash archive. Status of the ZFS Flash Archive Patches --- I have received test versions of the patches for zfs flash archive support (CR 6690473). Those patches are: sparc: 119534-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124630-26: updates to the install software x86: 119535-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124631-27: updates to the install software The patches are applied as follows: The flarcreate/flar patch (119534-15/119535-15) must be applied to the system where the flash archive is generated. The install software patch (124630-26/124631-27) must be applied to the install medium (probably a netinstall image), since that is where the install software resides. A system being installed with a flash archive image will have to be booted from a patched image so that the install software can recognize the zfs-type flash archive and handle it correctly. I verified these patches on both sparc and x86 platforms, and as applied to both Update 6 and Update 7 systems and images. On Update 6, it is also necessary to apply the kernel update (KU) patch to the netinstall image in order for the install to work. The KU patch is
Re: [zfs-discuss] zfs root, jumpstart and flash archives
On Wed, 8 Jul 2009, Fredrich Maney wrote: Any idea what the Patch ID was? x86:119535-15 SPARC: 119534 Description of change 6690473 request to have flash support for ZFS root install. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Bob, Patches that allow the creation and installation of a flash archive on a zpool are available: For SPARC: 119534-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124630-26: updates to the install software For x86: 119535-15 : fixes to the /usr/sbin/flarcreate and /usr/sbin/flar command 124631-27: updates to the install software You'll have to patch the miniroot of your network boot server in order to have it work (http://www.sun.com/bigadmin/features/hub_articles/patchmini.jsp). Jnm. -- Bob Friesenhahn a écrit : On Wed, 8 Jul 2009, Jerry K wrote: It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. I saw that a Solaris 10 patch for supporting Flash archives on ZFS came out about a week ago. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Hi for sparc 119534-15 124630-26 for x86 119535-15 124631-27 higher rev's of these will also suffice. Note these need to be applied to the miniroot of the jumpstart image so that it can then install zfs flash archive. please read the README notes in these for more specific instructions, including instructions on miniroot patching. Enda Fredrich Maney wrote: Any idea what the Patch ID was? fpsm On Wed, Jul 8, 2009 at 3:43 PM, Bob Friesenhahnbfrie...@simple.dallas.tx.us wrote: On Wed, 8 Jul 2009, Jerry K wrote: It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. I saw that a Solaris 10 patch for supporting Flash archives on ZFS came out about a week ago. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
Thankyou! Am I right in thinking that rpool snapshots will include things like swap? If so, is there some way to exclude them? Much like rsync has --exclude? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
Carl Brewer wrote: Thankyou! Am I right in thinking that rpool snapshots will include things like swap? If so, is there some way to exclude them? Much like rsync has --exclude? No. Snapshots are a feature of the dataset, not the pool. So you would have separate snapshot policies for each file system (eg rpool) and volume (eg swap and dump). -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
On 07/08/09 15:57, Carl Brewer wrote: Thankyou! Am I right in thinking that rpool snapshots will include things like swap? If so, is there some way to exclude them? Much like rsync has --exclude? By default, the zfs send -R will send all the snapshots, including swap and dump. But you can do the following after taking the snapshot: # zfs destroy rpool/d...@mmddhh # zfs destroy rpool/s...@mmddhh and then do the zfs send -R . You'll get messages about the missing snapshots, but they can be ignored. In order to re-create a bootable pool from your backup, there are additional steps required. A full description of a procedure similar to what you are attempting can be found here: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#ZFS_Root_Pool_Recovery Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
Thankyou! Am I right in thinking that rpool snapshots will include things like swap? If so, is there some way to exclude them? Hi Carl :) You can't exclude them from the send -R with something like --exclude, but you can make sure there are no such snapshots (which aren't useful anyway) before sending, as noted. As well as deleting them, another way to do this is to not create them in the first place. If you use the snapshots created by tim's zfs-auto-snapshot service, that service observes a property on each dataset that excludes snapshots being taken on that dataset. There are convenient hooks in that service that you can use to facilitate the sending step directly once the snapshots are taken, and to use incremental sends of the snapshots as well. You might consider your replication schedule, too - for example, keep frequent and maybe even hourly snapshots only on the internal pool, and replicate daily and beyond snapshots to the external drive ready for removal. If you arrange your schedule of swapping drives well enough, such that a drive returns from offsite storage and is reconnected while the most recent snapshot it contains is still present in rpool, then catching it up with the week of snapshots it missed while offsite can be quick. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single disk parity
Haudy Kazemi wrote: Daniel Carosone wrote: Sorry, don't have a thread reference to hand just now. http://www.opensolaris.org/jive/thread.jspa?threadID=100296 Note that there's little empirical evidence that this is directly applicable to the kinds of errors (single bit, or otherwise) that a single failing disk medium would produce. Modern disks already include and rely on a lot of ECC as part of ordinary operation, below the level usually seen by the host. These mechanisms seem unlikely to return a read with just one (or a few) bit errors. This strikes me, if implemented, as potentially more applicable to errors introduced from other sources (controller/bus transfer errors, non-ecc memory, weak power supply, etc). Still handy. Adding additional data protection options are commendable. On the other hand I feel there are important gaps in the existing feature set that are worthy of a higher priority, not the least of which is the automatic recovery of uberblock / transaction group problems (see Victor Latushkin's recovery technique which I linked to in a recent post), This does not seem to be a widespread problem. We do see the occasional complaint on this forum, but considering the substantial number of ZFS implementations in existence today, the rate seems to be quite low. In other words, the impact does not seem to be high. Perhaps someone at Sun could comment on the call rate for such conditions? followed closely by a zpool shrink or zpool remove command that lets you resize pools and disconnect devices without replacing them. I saw postings or blog entries from about 6 months ago that this code was 'near' as part of solving a resilvering bug but have not seen anything else since. I think many users would like to see improved resilience in the existing features and the addition of frequently long requested features before other new features are added. (Exceptions can readily be made for new features that are trivially easy to implement and/or are not competing for developer time with higher priority features.) In the meantime, there is the copies flag option that you can use on single disks. With immense drives, even losing 1/2 the capacity to copies isn't as traumatic for many people as it was in days gone by. (E.g. consider a 500 gb hard drive with copies=2 versus a 128 gb SSD). Of course if you need all that space then it is a no-go. Space, performance, dependability: you can pick any two. Related threads that also had ideas on using spare CPU cycles for brute force recovery of single bit errors using the checksum: There is no evidence that the type of unrecoverable read errors we see are single bit errors. And while it is possible for an error handling code to correct single bit flips, multiple bit flips would remain as a large problem space. There are error codes which can correct multiple flips, but they quickly become expensive. This is one reason why nobody does RAID-2. BTW, if you do have the case where unprotected data is not readable, then I have a little DTrace script that I'd like you to run which would help determine the extent of the corruption. This is one of those studies which doesn't like induced errors ;-) http://www.richardelling.com/Home/scripts-and-programs-1/zcksummon The data we do have suggests that magnetic hard disk failures tend to be spatially clustered. So there is still the problem of spatial diversity which is rather nicely handled by copies, today. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Booting from detached mirror disk
Hi, I have mirrored boot disk and I am able to boot from either disk. If I detach mirror would I be able to boot from detached disk? Thanks. Sunil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Booting from detached mirror disk
Did you run installgrub on both disks: /usr/sbin/installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/cxtydzs0 Or the equivalent. If you can't boot from either, how did either become your boot disk? If you want to use a single mirror member disk to boot from (i.e. for testing), I wouldn't detach it. Boot from it and let ZFS complain about the missing member of the mirror for the short term. That's my idea, but I'm not an expert. This process works for me if I'm testing something regarding my mirror or testing a disk or controller. I don't know how one would boot from a detached disk, but other folks will have more experience. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs
Just trying to help since no one has responded Have you tried importing with an alternate root? We don't know your setup, such as other pools, types of controllers and/or disks, or how your pool was constructed. Try importing something like this: zpool import -R /tank2 -f pool_numeric_identifier Perhaps you have some overlap with your existing pools and bigtank, so this might help to track that down. You can always export it and re-import with the correct root once you get to the bottom of this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Booting from detached mirror disk
By the way, if you try my idea and both disks remain physically attached, both should be found and the mirror will be intact, regardless of which disk you boot from. If one is physically disconnected, then you will have complaints about the missing disk, but it should still work if everything is configured correctly and your BIOS doesn't present you with any difficulties. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss