Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)
> when you say remove the device, I assume you mean simply make it unavailable > for import (I can't remove it from the vdev). Yes, that's what I meant. > root@openindiana-01:/mnt# zpool import -d /dev/lofi > pool: ZP-8T-RZ1-01 > id: 9952605666247778346 > state: FAULTED > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-3C > config: > > ZP-8T-RZ1-01 FAULTED corrupted data > raidz1-0 DEGRADED > 12339070507640025002 UNAVAIL cannot open > /dev/lofi/5 ONLINE > /dev/lofi/4 ONLINE > /dev/lofi/3 ONLINE > /dev/lofi/1 ONLINE > > It's interesting that even though 4 of the 5 disks are available, it still > can import it as DEGRADED. I agree that it's "interesting". Now someone really knowledgable will need to have a look at this. I can only imagine that somehow the devices contain data from different points in time, and that it's too far apart for the aggressive txg rollback that was added in PSARC 2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)
On Fri, Jun 15, 2012 at 10:54:34AM +0200, Stefan Ring wrote: > >> Have you also mounted the broken image as /dev/lofi/2? > > > > Yep. > > Wouldn't it be better to just remove the corrupted device? This worked > just fine in my case. > Hi Stefan, when you say remove the device, I assume you mean simply make it unavailable for import (I can't remove it from the vdev). This is what happens (lofi/2 is the drive which ZFS thinks has corrupted data): oot@openindiana-01:/mnt# zpool import -d /dev/lofi pool: ZP-8T-RZ1-01 id: 9952605666247778346 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: ZP-8T-RZ1-01 FAULTED corrupted data raidz1-0ONLINE 12339070507640025002 UNAVAIL corrupted data /dev/lofi/5 ONLINE /dev/lofi/4 ONLINE /dev/lofi/3 ONLINE /dev/lofi/1 ONLINE root@openindiana-01:/mnt# lofiadm -d /dev/lofi/2 root@openindiana-01:/mnt# zpool import -d /dev/lofi pool: ZP-8T-RZ1-01 id: 9952605666247778346 state: FAULTED status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-3C config: ZP-8T-RZ1-01 FAULTED corrupted data raidz1-0DEGRADED 12339070507640025002 UNAVAIL cannot open /dev/lofi/5 ONLINE /dev/lofi/4 ONLINE /dev/lofi/3 ONLINE /dev/lofi/1 ONLINE So in the second import, it complains that it can't open the device, rather than saying it has corrupted data. It's interesting that even though 4 of the 5 disks are available, it still can import it as DEGRADED. Thanks again. Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
Sorry, if you meant distinguishing between true 512 and emulated 512/4k, I don't know, it may be vendor-specific as to whether they expose it through device commands at all. Tim On Fri, Jun 15, 2012 at 6:02 PM, Timothy Coalson wrote: > On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov wrote: >> 2012-06-16 0:05, John Martin wrote: Its important to know... >>> >>> ...whether the drive is really 4096p or 512e/4096p. >> >> >> BTW, is there a surefire way to learn that programmatically >> from Solaris or its derivates > > prtvtoc should show the block size the OS thinks it has. Or > you can use format, select the disk from a list that includes the > model number and size, and use "verify". > > Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov wrote: > 2012-06-16 0:05, John Martin wrote: >>> >>> Its important to know... >> >> ...whether the drive is really 4096p or 512e/4096p. > > > BTW, is there a surefire way to learn that programmatically > from Solaris or its derivates prtvtoc should show the block size the OS thinks it has. Or you can use format, select the disk from a list that includes the model number and size, and use "verify". Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
2012-06-16 0:05, John Martin wrote: Its important to know... ...whether the drive is really 4096p or 512e/4096p. BTW, is there a surefire way to learn that programmatically from Solaris or its derivates (i.e. from SCSI driver options, format/scsi/inquiry, SMART or some similar way)? Or if the drive lies, saying its sectors are 512b while they physically are 4KB - it is undetectable except by reading vendor specs? Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS asynchronous writes being written to ZIL
On Fri, Jun 15, 2012 at 12:56 PM, Timothy Coalson wrote: > Thanks for the suggestions. I think it would also depend on whether > the nfs server has tried to write asynchronously to the pool in the > meantime, which I am unsure how to test, other than making the txgs > extremely frequent and watching the load on the log devices. I didn't want to reboot the main file server to test this, so I used zilstat on the backup nfs server (which has nearly identical hardware and configuration, but doesn't have SSDs for a separate ZIL) to see if I could estimate the difference it would make, and the story got stranger: it wrote far less data to the ZIL for the same copy operation (single 8GB file): $ sudo ./zilstat -M -l 20 -p backuppool txg waiting for txg commit... txg N-MB N-MB/s N-Max-Rate B-MB B-MB/s B-Max-Rateops <=4kB 4-32kB >=32kB 2833307 1 0 1 1 0 1 15 0 0 15 2833308 0 0 0 0 0 0 0 0 0 0 2833309 1 0 1 1 0 1 8 0 0 8 2833310 0 0 0 0 0 0 4 0 0 4 2833311 1 0 0 1 0 0 9 0 0 9 2833312 0 0 0 0 0 0 0 0 0 0 2833313 2 0 2 2 0 2 21 0 0 21 2833314 7 1 7 8 1 8 63 0 0 63 2833315 1 0 1 2 0 2 18 0 0 18 2833316 0 0 0 0 0 0 5 0 0 5 A small sample from the server with SSD log devices doing the same operation: $ sudo ./zilstat -M -l 20 -p mainpool txg waiting for txg commit... txg N-MB N-MB/s N-Max-Rate B-MB B-MB/s B-Max-Rateops <=4kB 4-32kB >=32kB 2808483989197593 1967393 1180 15010 0 0 15010 2808484599 99208 1134189 393 8653 0 0 8653 2808485 0 0 0 0 0 0 0 0 0 0 2808486137 27126255 51 235 1953 0 0 1953 2808487460 92460859171 859 6555 0 0 6555 2808488530 75530 1031147 1031 7871 0 0 7871 Setting logbias=throughput makes the server with the SSD log devices act the same as the server without them, as far as I can tell, which I somewhat expected. However, I did not expect use of separate log devices to change how often ZIL ops are performed, other than to raise the upper limit if the device can service more IOPS. Additionally, nfssvrtop showed a lower value for Com_t when not using the separate log device (2.1s with logbias=latency, 0.24s with throughput). Copying a folder with small files and subdirectories pushes the server to ~400 ZIL ops per txg with logbias=throughput, so it shouldn't be the device performance making it only issue ~15 ops per txg copying a large file without using a separate log device. I am thinking of transplanting one of the SSDs temporarily for testing, but I would be interested to know the cause of this behavior. I don't know why more asynchronous writes seem to be making it into txgs without being caught by an nfs commit when a separate log device isn't used. Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
On 06/15/12 15:52, Cindy Swearingen wrote: Its important to identify your OS release to determine if booting from a 4k disk is supported. In addition, whether the drive is really 4096p or 512e/4096p. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
Hi Hans, Its important to identify your OS release to determine if booting from a 4k disk is supported. Thanks, Cindy On 06/15/12 06:14, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS asynchronous writes being written to ZIL
Thanks for the suggestions. I think it would also depend on whether the nfs server has tried to write asynchronously to the pool in the meantime, which I am unsure how to test, other than making the txgs extremely frequent and watching the load on the log devices. As for the integer division giving misleading zeros, one possible solution is to add (delay-1) to the count before dividing by delay, so if there are any, it will show at least 1 (or you could get fancy and do fixed point numbers). As for very frequent txgs, I imagine this could cause more fragmentation (more metadata written and discarded more frequently), is there a way to estimate or test for the impact of it? Depending on how it allocates the metadata blocks, I suppose it could write it to the blocks recently vacated by old metadata due to the previous txg, and have almost no impact until a snapshot is taken, is it smart enough to do this? Tim On Fri, Jun 15, 2012 at 10:56 AM, Richard Elling wrote: > [Phil beat me to it] > Yes, the 0s are a result of integer division in DTrace/kernel. > > On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote: > >> Indeed they are there, shown with 1 second interval. So, it is the >> client's fault after all. I'll have to see whether it is somehow >> possible to get the server to write cached data sooner (and hopefully >> asynchronous), and the client to issue commits less often. Luckily I >> can live with the current behavior (and the SSDs shouldn't give out >> any time soon even being used like this), if it isn't possible to >> change it. > > If this is the proposed workload, then it is possible to tune the DMU to > manage commits more efficiently. In an ideal world, it does this > automatically, > but the algorithms are based on a bandwidth calculation and those are not > suitable for HDD capacity planning. The efficiency goal would be to do less > work, more often and there are two tunables that can apply: > > 1. the txg_timeout controls the default maximum transaction group commit > interval and is set to 5 seconds on modern ZFS implementations. > > 2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg > will > be committed when the size reaches this limit, rather than waiting for the > txg_timeout. For streaming writes, this can work better than tuning the > txg_timeout. > > -- richard > >> >> Thanks for all the help, >> Tim >> >> On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman wrote: >>> On 14 Jun 2012, at 23:15, Timothy Coalson wrote: >>> > The client is using async writes, that include commits. Sync writes do not > need commits. Are you saying nfs commit operations sent by the client aren't always reported by that script? >>> >>> They are not reported in your case because the commit rate is less than one >>> per second. >>> >>> DTrace is an amazing tool, but it does dictate certain coding compromises, >>> particularly when it comes to output scaling, grouping, sorting and >>> formatting. >>> >>> In this script the commit rate is calculated using integer division. In >>> your case the sample interval is 5 seconds, so up to 4 commits per second >>> will be reported as a big fat zero. >>> >>> If you use a sample interval of 1 second you should see occasional commits. >>> We know they are there because we see a non-zero commit time. >>> >>> > > -- > > ZFS and performance consulting > http://www.RichardElling.com > > > > > > > > > > > > > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
hi what is the version of Solaris? uname -a output? regards On 6/15/2012 10:37 AM, Hung-Sheng Tsao Ph.D. wrote: by the way when you format start with cylinder 1 donot use 0 depend on the version of Solaris you may not be able to use 2TB as root regards On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote: yes which version of solaris or bsd you are using? for bsd I donot know the steps for create new BE (boot env) for s10 and opensolaris and solaris express (may be other opensolaris fork) , you use the liveupgrade for s11 you use beadm regards On 6/15/2012 9:13 AM, Hans J Albertsson wrote: I suppose I must start by labelling the new disk properly, and give the s0 partition to zpool, so the new zpool can be booted? Skickat från min Android Mobil "Hung-Sheng Tsao Ph.D." skrev: one possible way: 1)break the mirror 2)install new hdd, format the HDD 3)create new zpool on new hdd with 4k block 4)create new BE on the new pool with the old root pool as source (not sure which version of "solaris" or "openSolaris" ypu are using the procedure may be different depend on version" 5)activate the new BE 6)boot the new BE 7)destroy the old zpool 8)replace old HDD with new HDD 9)format the HDD 10)attach the HDD to the new root pool regards On 6/15/2012 8:14 AM, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- -- -- <>___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS asynchronous writes being written to ZIL
On Jun 14, 2012, at 1:35 PM, Robert Milkowski wrote: >> The client is using async writes, that include commits. Sync writes do >> not need commits. >> >> What happens is that the ZFS transaction group commit occurs at more- >> or-less regular intervals, likely 5 seconds for more modern ZFS >> systems. When the commit occurs, any data that is in the ARC but not >> commited in a prior transaction group gets sent to the ZIL > > Are you sure? I don't think this is the case unless I misunderstood you or > this is some recent change to Illumos. Need to make sure we are clear here, there is time between the txg being closed and the txg being on disk. During that period, a sync write of the data in the closed txg is written to the ZIL. > Whatever is being committed when zfs txg closes goes directly to pool and > not to zil. Only sync writes will go to zil right a way (and not always, see > logbias, etc.) and to arc to be committed later to a pool when txg closes. In this specific case, there are separate log devices, so logbias doesn't apply. -- richard -- ZFS and performance consulting http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS asynchronous writes being written to ZIL
[Phil beat me to it] Yes, the 0s are a result of integer division in DTrace/kernel. On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote: > Indeed they are there, shown with 1 second interval. So, it is the > client's fault after all. I'll have to see whether it is somehow > possible to get the server to write cached data sooner (and hopefully > asynchronous), and the client to issue commits less often. Luckily I > can live with the current behavior (and the SSDs shouldn't give out > any time soon even being used like this), if it isn't possible to > change it. If this is the proposed workload, then it is possible to tune the DMU to manage commits more efficiently. In an ideal world, it does this automatically, but the algorithms are based on a bandwidth calculation and those are not suitable for HDD capacity planning. The efficiency goal would be to do less work, more often and there are two tunables that can apply: 1. the txg_timeout controls the default maximum transaction group commit interval and is set to 5 seconds on modern ZFS implementations. 2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg will be committed when the size reaches this limit, rather than waiting for the txg_timeout. For streaming writes, this can work better than tuning the txg_timeout. -- richard > > Thanks for all the help, > Tim > > On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman wrote: >> On 14 Jun 2012, at 23:15, Timothy Coalson wrote: >> The client is using async writes, that include commits. Sync writes do not need commits. >>> >>> Are you saying nfs commit operations sent by the client aren't always >>> reported by that script? >> >> They are not reported in your case because the commit rate is less than one >> per second. >> >> DTrace is an amazing tool, but it does dictate certain coding compromises, >> particularly when it comes to output scaling, grouping, sorting and >> formatting. >> >> In this script the commit rate is calculated using integer division. In your >> case the sample interval is 5 seconds, so up to 4 commits per second will be >> reported as a big fat zero. >> >> If you use a sample interval of 1 second you should see occasional commits. >> We know they are there because we see a non-zero commit time. >> >> -- ZFS and performance consulting http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
by the way when you format start with cylinder 1 donot use 0 depend on the version of Solaris you may not be able to use 2TB as root regards On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote: yes which version of solaris or bsd you are using? for bsd I donot know the steps for create new BE (boot env) for s10 and opensolaris and solaris express (may be other opensolaris fork) , you use the liveupgrade for s11 you use beadm regards On 6/15/2012 9:13 AM, Hans J Albertsson wrote: I suppose I must start by labelling the new disk properly, and give the s0 partition to zpool, so the new zpool can be booted? Skickat från min Android Mobil "Hung-Sheng Tsao Ph.D." skrev: one possible way: 1)break the mirror 2)install new hdd, format the HDD 3)create new zpool on new hdd with 4k block 4)create new BE on the new pool with the old root pool as source (not sure which version of "solaris" or "openSolaris" ypu are using the procedure may be different depend on version" 5)activate the new BE 6)boot the new BE 7)destroy the old zpool 8)replace old HDD with new HDD 9)format the HDD 10)attach the HDD to the new root pool regards On 6/15/2012 8:14 AM, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- -- <>___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
yes which version of solaris or bsd you are using? for bsd I donot know the steps for create new BE (boot env) for s10 and opensolaris and solaris express (may be other opensolaris fork) , you use the liveupgrade for s11 you use beadm regards On 6/15/2012 9:13 AM, Hans J Albertsson wrote: I suppose I must start by labelling the new disk properly, and give the s0 partition to zpool, so the new zpool can be booted? Skickat från min Android Mobil "Hung-Sheng Tsao Ph.D." skrev: one possible way: 1)break the mirror 2)install new hdd, format the HDD 3)create new zpool on new hdd with 4k block 4)create new BE on the new pool with the old root pool as source (not sure which version of "solaris" or "openSolaris" ypu are using the procedure may be different depend on version" 5)activate the new BE 6)boot the new BE 7)destroy the old zpool 8)replace old HDD with new HDD 9)format the HDD 10)attach the HDD to the new root pool regards On 6/15/2012 8:14 AM, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- <>___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
On 06/15/2012 03:35 PM, Johannes Totz wrote: > On 15/06/2012 13:22, Sašo Kiselkov wrote: >> On 06/15/2012 02:14 PM, Hans J Albertsson wrote: >>> I've got my root pool on a mirror on 2 512 byte blocksize disks. I >>> want to move the root pool to two 2 TB disks with 4k blocks. The >>> server only has room for two disks. I do have an esata connector, >>> though, and a suitable external cabinet for connecting one extra disk. >>> >>> How would I go about migrating/expanding the root pool to the >>> larger disks so I can then use the larger disks for booting? >>> I have no extra machine to use. >> >> Suppose we call the disks like so: >> >> A, B: your old 512-block drives >> X, Y: your new 2TB drives >> >> The easiest way would be to simply: >> >> 1) zpool set autoexpand=on rpool >> 2) offline the A drive >> 3) physically replace it with the X drive >> 4) do a "zpool replace" on it and wait for it to resilver > > When sector size differs, attaching it is going to fail (at least on fbsd). > You might not get around a send-receive cycle... Jim Klimov has already posted a way better guide, which rebuilds the pool using the old one's data, so yeah, the replace route I recommended here is rendered moot. -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
On 15/06/2012 13:22, Sašo Kiselkov wrote: > On 06/15/2012 02:14 PM, Hans J Albertsson wrote: >> I've got my root pool on a mirror on 2 512 byte blocksize disks. I >> want to move the root pool to two 2 TB disks with 4k blocks. The >> server only has room for two disks. I do have an esata connector, >> though, and a suitable external cabinet for connecting one extra disk. >> >> How would I go about migrating/expanding the root pool to the >> larger disks so I can then use the larger disks for booting? >> I have no extra machine to use. > > Suppose we call the disks like so: > > A, B: your old 512-block drives > X, Y: your new 2TB drives > > The easiest way would be to simply: > > 1) zpool set autoexpand=on rpool > 2) offline the A drive > 3) physically replace it with the X drive > 4) do a "zpool replace" on it and wait for it to resilver When sector size differs, attaching it is going to fail (at least on fbsd). You might not get around a send-receive cycle... > 5) offline the B drive > 6) physically replace it with the Y drive > 7) do a "zpool replace" on it and wait for it to resilver > > At this point, you should have a 2TB rpool (thanks to the > "autoexpand=on" in step 1). Unfortunately, to my knowledge, there is no > way to convert a bshift=9 pool (512 byte sectors) to a bshift=13 pool > (4k sectors). Perhaps some great ZFS guru can shed more light on this. > > -- > Saso > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
2012-06-15 17:18, Jim Klimov wrote: 7) If you're on live media, try to rename the new "rpool2" to become "rpool", i.e.: # zpool export rpool2 # zpool export rpool # zpool import -N rpool rpool2 # zpool export rpool Ooops, bad typo in third line; should be: # zpool export rpool2 # zpool export rpool # zpool import -N rpool2 rpool # zpool export rpool Sorry, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
2012-06-15 16:14, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. I think this question was recently asked and discussed on another list; my suggestion would be more low-level than that suggested by others: 0) Boot from a LiveCD/LiveUSB so that your rpool's environment doesn't change during the migration, and so that you can ultimately rename your new rpool to its old name. It is not fatal if you don't use a LiveMedia environment, but it can be problematic to rename a running rpool, and some of your programs might depend on its known name as recorded in some config file or service properties. 1) Break the existing mirror, reducing it to a single-disk pool 2) Install the new disk, slice it, create an "rpool2" on it. NOTE that you might not want all 2TB to be the "rpool2", but rather you might dedicate several tens of GBs to a root-pool partition or slice, and store the rest as a data pool - perhaps implemented with different choices on caching, dedup, etc. NOTE also that you might need to apply some tricks to enforce that the new pool uses ashift=12 if that (4KB) is your hardware native sector size. We had some info recently on the mailing lists and carried that over to illumos wiki: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks 3) # zfs snapshot -r rpool@20120615-preMigration 4) # zfs send -R rpool@20120615-preMigration | \ zfs recv -vFd rpool2 NOTE this assumes you do want the whole old rpool into rpool2. If you decide you want something on a data pool, i.e. the "/export/*" datasets - you'd have to make that pool and send the datasets there in a similar manner, and send the root pool datasets not in one recursive command, but in several sets i.e. for rpool/ROOT and rpool/swap and rpool/dump in the default layout. 5) # zpool get all rpool # zpool get all rpool2 Compare the pool settings. Carry over the "local" changes with # zpool set property=value rpool2 You'll likely change bootfs, failmode, maybe some others. 6) installgrub onto the new disk so it becomes bootable 7) If you're on live media, try to rename the new "rpool2" to become "rpool", i.e.: # zpool export rpool2 # zpool export rpool # zpool import -N rpool rpool2 # zpool export rpool 8) Reboot, disconnecting your remaining old disk, and hope that the new pool boots okay. It should ;) When it's ok, attach the second new disk to the system and slice it similarly (prtvtoc|fmthard usually helps, google it). Then attach the new second disk's slices to your new rpool (and data pool if you've made one), installgrub onto the second disk - and you're done. HTH, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
I suppose I must start by labelling the new disk properly, and give the s0 partition to zpool, so the new zpool can be booted? Skickat från min Android Mobil"Hung-Sheng Tsao Ph.D." skrev: one possible way: 1)break the mirror 2)install new hdd, format the HDD 3)create new zpool on new hdd with 4k block 4)create new BE on the new pool with the old root pool as source (not sure which version of "solaris" or "openSolaris" ypu are using the procedure may be different depend on version" 5)activate the new BE 6)boot the new BE 7)destroy the old zpool 8)replace old HDD with new HDD 9)format the HDD 10)attach the HDD to the new root pool regards On 6/15/2012 8:14 AM, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
one possible way: 1)break the mirror 2)install new hdd, format the HDD 3)create new zpool on new hdd with 4k block 4)create new BE on the new pool with the old root pool as source (not sure which version of "solaris" or "openSolaris" ypu are using the procedure may be different depend on version" 5)activate the new BE 6)boot the new BE 7)destroy the old zpool 8)replace old HDD with new HDD 9)format the HDD 10)attach the HDD to the new root pool regards On 6/15/2012 8:14 AM, Hans J Albertsson wrote: I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- <>___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
On 06/15/2012 02:14 PM, Hans J Albertsson wrote: > I've got my root pool on a mirror on 2 512 byte blocksize disks. > I want to move the root pool to two 2 TB disks with 4k blocks. > The server only has room for two disks. I do have an esata connector, though, > and a suitable external cabinet for connecting one extra disk. > > How would I go about migrating/expanding the root pool to the larger disks so > I can then use the larger disks for booting? > > I have no extra machine to use. Suppose we call the disks like so: A, B: your old 512-block drives X, Y: your new 2TB drives The easiest way would be to simply: 1) zpool set autoexpand=on rpool 2) offline the A drive 3) physically replace it with the X drive 4) do a "zpool replace" on it and wait for it to resilver 5) offline the B drive 6) physically replace it with the Y drive 7) do a "zpool replace" on it and wait for it to resilver At this point, you should have a 2TB rpool (thanks to the "autoexpand=on" in step 1). Unfortunately, to my knowledge, there is no way to convert a bshift=9 pool (512 byte sectors) to a bshift=13 pool (4k sectors). Perhaps some great ZFS guru can shed more light on this. -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks
I've got my root pool on a mirror on 2 512 byte blocksize disks. I want to move the root pool to two 2 TB disks with 4k blocks. The server only has room for two disks. I do have an esata connector, though, and a suitable external cabinet for connecting one extra disk. How would I go about migrating/expanding the root pool to the larger disks so I can then use the larger disks for booting? I have no extra machine to use. Skickat från min Android Mobil___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)
>> Have you also mounted the broken image as /dev/lofi/2? > > Yep. Wouldn't it be better to just remove the corrupted device? This worked just fine in my case. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Salvaging ZFS data
Hello! Unfortunately on one of our Areca RAID controllers has encountered a power failure which corrupted our zpool and partitions. We have tried to assemble some new headers but it looks like not only the headers/uberblocks but also the MOS has been damaged. We now have moved on from trying to repair the partition to salvage the non-damaged data of it. I have read all documentation I have found thoroughly and decided to do the following; Search for meta-data of files by locating the ZAP object magic-number (0x2F52AB2AB), from there assemble the meta-data and eventually gather the data attached. For now I have one question. The zap_phys_t data structure (described in ZFS On Disk Specifications by SUN), does that 128KB structure reside INSIDE the dn_bonus of the corresponding dnode_phys_t ? I seem to misunderstand the link between the 2 structures. Thanks in advance! Regards, Gerrit ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)
On Fri, Jun 15, 2012 at 07:37:50AM +0200, Stefan Ring wrote: > > root@solaris-01:/mnt# ??zpool import -d /dev/lofi > > ??pool: ZP-8T-RZ1-01 > > ?? ??id: 9952605666247778346 > > ??state: FAULTED > > status: One or more devices contains corrupted data. > > action: The pool cannot be imported due to damaged devices or data. > > ?? see: http://www.sun.com/msg/ZFS-8000-5E > > config: > > > > ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data > > ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??ONLINE > > ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??corrupted data > > ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE > > ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE > > ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE > > ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE > > Have you also mounted the broken image as /dev/lofi/2? Yep. I first ran: for foo in WCAZA1217278 WCAZA1262989 WCAZA1447179 WCAZA1583652 WCAZA1589216 ; \ do lofiadm -a $foo ; done (the WC* are the file names of each disk image). root@solaris-01:/# ls -al /dev/lofi total 21 drwxr-xr-x 7 root root 7 Jun 14 22:06 . drwxr-xr-x 246 root sys 246 Jun 14 21:49 .. lrwxrwxrwx 1 root root 29 Jun 14 22:06 1 -> ../../devices/pseudo/lofi@0:1 lrwxrwxrwx 1 root root 29 Jun 14 22:06 2 -> ../../devices/pseudo/lofi@0:2 lrwxrwxrwx 1 root root 29 Jun 14 22:06 3 -> ../../devices/pseudo/lofi@0:3 lrwxrwxrwx 1 root root 29 Jun 14 22:06 4 -> ../../devices/pseudo/lofi@0:4 lrwxrwxrwx 1 root root 29 Jun 14 22:06 5 -> ../../devices/pseudo/lofi@0:5 Clearly there's a disk with an incorrect label. But how I can reconstruct that label is a problem. Also, there are four drives of the five-drive RAIDZ available. Based on what criteria does ZFS decide that it is FAULTED and not DEGRADED? Odd. Thanks, Scott ps I'm downloading OpenIndiana now. > > When I try to recreate your situation, it looks like this (as > expected), where /dev/lofi/2 is just not present: > > $ lofiadm > Block Device File Options > /dev/lofi/1 /dpool/dump/temp/watched/raid1 - > /dev/lofi/3 /dpool/dump/temp/watched/raid3 - > /dev/lofi/4 /dpool/dump/temp/watched/raid4 - > > $ sudo zpool import -d /dev/lofi >pool: lpool > id: 12540294359519404167 > state: DEGRADED > status: One or more devices are missing from the system. > action: The pool can be imported despite missing or damaged devices. The > fault tolerance of the pool may be compromised if imported. >see: http://illumos.org/msg/ZFS-8000-2Q > config: > > lpoolDEGRADED > raidz1-0 DEGRADED > /dev/lofi/1 ONLINE > /dev/lofi/2 UNAVAIL cannot open > /dev/lofi/3 ONLINE > /dev/lofi/4 ONLINE > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss