date:20120615

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Stefan Ring

> when you say remove the device, I assume you mean simply make it unavailable
> for import (I can't remove it from the vdev).

Yes, that's what I meant.

> root@openindiana-01:/mnt# zpool import -d /dev/lofi
>  pool: ZP-8T-RZ1-01
>    id: 9952605666247778346
>  state: FAULTED
> status: One or more devices are missing from the system.
> action: The pool cannot be imported. Attach the missing
>        devices and try again.
>   see: http://www.sun.com/msg/ZFS-8000-3C
> config:
>
>        ZP-8T-RZ1-01              FAULTED  corrupted data
>          raidz1-0                DEGRADED
>            12339070507640025002  UNAVAIL  cannot open
>            /dev/lofi/5           ONLINE
>            /dev/lofi/4           ONLINE
>            /dev/lofi/3           ONLINE
>            /dev/lofi/1           ONLINE
>
> It's interesting that even though 4 of the 5 disks are available, it still
> can import it as DEGRADED.

I agree that it's "interesting". Now someone really knowledgable will
need to have a look at this. I can only imagine that somehow the
devices contain data from different points in time, and that it's too
far apart for the aggressive txg rollback that was added in PSARC
2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
ZP-8T-RZ1-01.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Scott Aitken

On Fri, Jun 15, 2012 at 10:54:34AM +0200, Stefan Ring wrote:
> >> Have you also mounted the broken image as /dev/lofi/2?
> >
> > Yep.
> 
> Wouldn't it be better to just remove the corrupted device? This worked
> just fine in my case.
>

Hi Stefan,

when you say remove the device, I assume you mean simply make it unavailable
for import (I can't remove it from the vdev).

This is what happens (lofi/2 is the drive which ZFS thinks has corrupted
data):

oot@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0ONLINE
12339070507640025002  UNAVAIL  corrupted data
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE
root@openindiana-01:/mnt# lofiadm -d /dev/lofi/2
root@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0DEGRADED
12339070507640025002  UNAVAIL  cannot open
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE

So in the second import, it complains that it can't open the device, rather
than saying it has corrupted data.

It's interesting that even though 4 of the 5 disks are available, it still
can import it as DEGRADED.

Thanks again.
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Timothy Coalson

Sorry, if you meant distinguishing between true 512 and emulated
512/4k, I don't know, it may be vendor-specific as to whether they
expose it through device commands at all.

Tim

On Fri, Jun 15, 2012 at 6:02 PM, Timothy Coalson  wrote:
> On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov  wrote:
>> 2012-06-16 0:05, John Martin wrote:

 Its important to know...
>>>
>>> ...whether the drive is really 4096p or 512e/4096p.
>>
>>
>> BTW, is there a surefire way to learn that programmatically
>> from Solaris or its derivates
>
> prtvtoc  should show the block size the OS thinks it has.  Or
> you can use format, select the disk from a list that includes the
> model number and size, and use "verify".
>
> Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Timothy Coalson

On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov  wrote:
> 2012-06-16 0:05, John Martin wrote:
>>>
>>> Its important to know...
>>
>> ...whether the drive is really 4096p or 512e/4096p.
>
>
> BTW, is there a surefire way to learn that programmatically
> from Solaris or its derivates

prtvtoc  should show the block size the OS thinks it has.  Or
you can use format, select the disk from a list that includes the
model number and size, and use "verify".

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov


2012-06-16 0:05, John Martin wrote:

Its important to know...

...whether the drive is really 4096p or 512e/4096p.


BTW, is there a surefire way to learn that programmatically
from Solaris or its derivates (i.e. from SCSI driver options,
format/scsi/inquiry, SMART or some similar way)? Or if the
drive lies, saying its sectors are 512b while they physically
are 4KB - it is undetectable except by reading vendor specs?

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Timothy Coalson

On Fri, Jun 15, 2012 at 12:56 PM, Timothy Coalson  wrote:
> Thanks for the suggestions.  I think it would also depend on whether
> the nfs server has tried to write asynchronously to the pool in the
> meantime, which I am unsure how to test, other than making the txgs
> extremely frequent and watching the load on the log devices.

I didn't want to reboot the main file server to test this, so I used
zilstat on the backup nfs server (which has nearly identical hardware
and configuration, but doesn't have SSDs for a separate ZIL) to see if
I could estimate the difference it would make, and the story got
stranger: it wrote far less data to the ZIL for the same copy
operation (single 8GB file):

$ sudo ./zilstat -M -l 20 -p backuppool txg
waiting for txg commit...
   txg   N-MB N-MB/s N-Max-Rate   B-MB B-MB/s
B-Max-Rateops  <=4kB 4-32kB >=32kB
   2833307  1  0  1  1  0
1 15  0  0 15
   2833308  0  0  0  0  0
0  0  0  0  0
   2833309  1  0  1  1  0
1  8  0  0  8
   2833310  0  0  0  0  0
0  4  0  0  4
   2833311  1  0  0  1  0
0  9  0  0  9
   2833312  0  0  0  0  0
0  0  0  0  0
   2833313  2  0  2  2  0
2 21  0  0 21
   2833314  7  1  7  8  1
8 63  0  0 63
   2833315  1  0  1  2  0
2 18  0  0 18
   2833316  0  0  0  0  0
0  5  0  0  5

A small sample from the server with SSD log devices doing the same operation:

$ sudo ./zilstat -M -l 20 -p mainpool txg
waiting for txg commit...
   txg   N-MB N-MB/s N-Max-Rate   B-MB B-MB/s
B-Max-Rateops  <=4kB 4-32kB >=32kB
   2808483989197593   1967393
 1180  15010  0  0  15010
   2808484599 99208   1134189
  393   8653  0  0   8653
   2808485  0  0  0  0  0
0  0  0  0  0
   2808486137 27126255 51
  235   1953  0  0   1953
   2808487460 92460859171
  859   6555  0  0   6555
   2808488530 75530   1031147
 1031   7871  0  0   7871

Setting logbias=throughput makes the server with the SSD log devices
act the same as the server without them, as far as I can tell, which I
somewhat expected.  However, I did not expect use of separate log
devices to change how often ZIL ops are performed, other than to raise
the upper limit if the device can service more IOPS.  Additionally,
nfssvrtop showed a lower value for Com_t when not using the separate
log device (2.1s with logbias=latency, 0.24s with throughput).
Copying a folder with small files and subdirectories pushes the server
to ~400 ZIL ops per txg with logbias=throughput, so it shouldn't be
the device performance making it only issue ~15 ops per txg copying a
large file without using a separate log device.  I am thinking of
transplanting one of the SSDs temporarily for testing, but I would be
interested to know the cause of this behavior.  I don't know why more
asynchronous writes seem to be making it into txgs without being
caught by an nfs commit when a separate log device isn't used.

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread John Martin


On 06/15/12 15:52, Cindy Swearingen wrote:


Its important to identify your OS release to determine if
booting from a 4k disk is supported.


In addition, whether the drive is really 4096p or 512e/4096p.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Cindy Swearingen


Hi Hans,

Its important to identify your OS release to determine if
booting from a 4k disk is supported.

Thanks,

Cindy



On 06/15/12 06:14, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector,
though, and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger
disks so I can then use the larger disks for booting?

I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Timothy Coalson

Thanks for the suggestions.  I think it would also depend on whether
the nfs server has tried to write asynchronously to the pool in the
meantime, which I am unsure how to test, other than making the txgs
extremely frequent and watching the load on the log devices.  As for
the integer division giving misleading zeros, one possible solution is
to add (delay-1) to the count before dividing by delay, so if there
are any, it will show at least 1 (or you could get fancy and do fixed
point numbers).

As for very frequent txgs, I imagine this could cause more
fragmentation (more metadata written and discarded more frequently),
is there a way to estimate or test for the impact of it?  Depending on
how it allocates the metadata blocks, I suppose it could write it to
the blocks recently vacated by old metadata due to the previous txg,
and have almost no impact until a snapshot is taken, is it smart
enough to do this?

Tim

On Fri, Jun 15, 2012 at 10:56 AM, Richard Elling
 wrote:
> [Phil beat me to it]
> Yes, the 0s are a result of integer division in DTrace/kernel.
>
> On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote:
>
>> Indeed they are there, shown with 1 second interval.  So, it is the
>> client's fault after all.  I'll have to see whether it is somehow
>> possible to get the server to write cached data sooner (and hopefully
>> asynchronous), and the client to issue commits less often.  Luckily I
>> can live with the current behavior (and the SSDs shouldn't give out
>> any time soon even being used like this), if it isn't possible to
>> change it.
>
> If this is the proposed workload, then it is possible to tune the DMU to
> manage commits more efficiently. In an ideal world, it does this 
> automatically,
> but the algorithms are based on a bandwidth calculation and those are not
> suitable for HDD capacity planning. The efficiency goal would be to do less
> work, more often and there are two tunables that can apply:
>
> 1. the txg_timeout controls the default maximum transaction group commit
> interval and is set to 5 seconds on modern ZFS implementations.
>
> 2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg 
> will
> be committed when the size reaches this limit, rather than waiting for the
> txg_timeout. For streaming writes, this can work better than tuning the
> txg_timeout.
>
>  -- richard
>
>>
>> Thanks for all the help,
>> Tim
>>
>> On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman  wrote:
>>> On 14 Jun 2012, at 23:15, Timothy Coalson  wrote:
>>>
> The client is using async writes, that include commits. Sync writes do not
> need commits.

 Are you saying nfs commit operations sent by the client aren't always
 reported by that script?
>>>
>>> They are not reported in your case because the commit rate is less than one 
>>> per second.
>>>
>>> DTrace is an amazing tool, but it does dictate certain coding compromises, 
>>> particularly when it comes to output scaling, grouping, sorting and 
>>> formatting.
>>>
>>> In this script the commit rate is calculated using integer division. In 
>>> your case the sample interval is 5 seconds, so up to 4 commits per second 
>>> will be reported as a big fat zero.
>>>
>>> If you use a sample interval of 1 second you should see occasional commits. 
>>> We know they are there because we see a non-zero commit time.
>>>
>>>
>
> --
>
> ZFS and performance consulting
> http://www.RichardElling.com
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.


hi
what is the version of Solaris?
uname -a output?
regards


On 6/15/2012 10:37 AM, Hung-Sheng Tsao Ph.D. wrote:

by the way
when you format start with cylinder 1 donot use 0
depend on the version of Solaris you may not be able to use 2TB as root
regards


On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote:

yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

"Hung-Sheng Tsao Ph.D."  skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source 
(not sure  which version of "solaris" or "openSolaris" ypu are using 
the procedure may be different depend on version"

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata 
connector, though, and a suitable external cabinet for connecting 
one extra disk.


How would I go about migrating/expanding the root pool to the 
larger disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--



--



--


<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Richard Elling

On Jun 14, 2012, at 1:35 PM, Robert Milkowski wrote:

>> The client is using async writes, that include commits. Sync writes do
>> not need commits.
>> 
>> What happens is that the ZFS transaction group commit occurs at more-
>> or-less regular intervals, likely 5 seconds for more modern ZFS
>> systems. When the commit occurs, any data that is in the ARC but not
>> commited in a prior transaction group gets sent to the ZIL
> 
> Are you sure? I don't think this is the case unless I misunderstood you or
> this is some recent change to Illumos.

Need to make sure we are clear here, there is time between the txg being
closed and the txg being on disk. During that period, a sync write of the
data in the closed txg is written to the ZIL.

> Whatever is being committed when zfs txg closes goes directly to pool and
> not to zil. Only sync writes will go to zil right a way (and not always, see
> logbias, etc.) and to arc to be committed later to a pool when txg closes.

In this specific case, there are separate log devices, so logbias doesn't apply.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Richard Elling

[Phil beat me to it]
Yes, the 0s are a result of integer division in DTrace/kernel.

On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote:

> Indeed they are there, shown with 1 second interval.  So, it is the
> client's fault after all.  I'll have to see whether it is somehow
> possible to get the server to write cached data sooner (and hopefully
> asynchronous), and the client to issue commits less often.  Luckily I
> can live with the current behavior (and the SSDs shouldn't give out
> any time soon even being used like this), if it isn't possible to
> change it.

If this is the proposed workload, then it is possible to tune the DMU to
manage commits more efficiently. In an ideal world, it does this automatically,
but the algorithms are based on a bandwidth calculation and those are not
suitable for HDD capacity planning. The efficiency goal would be to do less
work, more often and there are two tunables that can apply:

1. the txg_timeout controls the default maximum transaction group commit
interval and is set to 5 seconds on modern ZFS implementations.

2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg 
will
be committed when the size reaches this limit, rather than waiting for the
txg_timeout. For streaming writes, this can work better than tuning the 
txg_timeout.

 -- richard

> 
> Thanks for all the help,
> Tim
> 
> On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman  wrote:
>> On 14 Jun 2012, at 23:15, Timothy Coalson  wrote:
>> 
 The client is using async writes, that include commits. Sync writes do not
 need commits.
>>> 
>>> Are you saying nfs commit operations sent by the client aren't always
>>> reported by that script?
>> 
>> They are not reported in your case because the commit rate is less than one 
>> per second.
>> 
>> DTrace is an amazing tool, but it does dictate certain coding compromises, 
>> particularly when it comes to output scaling, grouping, sorting and 
>> formatting.
>> 
>> In this script the commit rate is calculated using integer division. In your 
>> case the sample interval is 5 seconds, so up to 4 commits per second will be 
>> reported as a big fat zero.
>> 
>> If you use a sample interval of 1 second you should see occasional commits. 
>> We know they are there because we see a non-zero commit time.
>> 
>> 

-- 

ZFS and performance consulting
http://www.RichardElling.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.


by the way
when you format start with cylinder 1 donot use 0
depend on the version of Solaris you may not be able to use 2TB as root
regards


On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote:

yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

"Hung-Sheng Tsao Ph.D."  skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source 
(not sure  which version of "solaris" or "openSolaris" ypu are using 
the procedure may be different depend on version"

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata 
connector, though, and a suitable external cabinet for connecting 
one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--



--


<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.


yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

"Hung-Sheng Tsao Ph.D."  skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not 
sure  which version of "solaris" or "openSolaris" ypu are using the 
procedure may be different depend on version"

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, 
though, and a suitable external cabinet for connecting one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--


<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Sašo Kiselkov

On 06/15/2012 03:35 PM, Johannes Totz wrote:
> On 15/06/2012 13:22, Sašo Kiselkov wrote:
>> On 06/15/2012 02:14 PM, Hans J Albertsson wrote:
>>> I've got my root pool on a mirror on 2 512 byte blocksize disks. I
>>> want to move the root pool to two 2 TB disks with 4k blocks. The
>>> server only has room for two disks. I do have an esata connector,
>>> though, and a suitable external cabinet for connecting one extra disk.
>>>
>>> How would I go about migrating/expanding the root pool to the
>>> larger disks so I can then use the larger disks for booting?
>>> I have no extra machine to use.
>>
>> Suppose we call the disks like so:
>>
>>   A, B: your old 512-block drives
>>   X, Y: your new 2TB drives
>>
>> The easiest way would be to simply:
>>
>> 1) zpool set autoexpand=on rpool
>> 2) offline the A drive
>> 3) physically replace it with the X drive
>> 4) do a "zpool replace" on it and wait for it to resilver
> 
> When sector size differs, attaching it is going to fail (at least on fbsd).
> You might not get around a send-receive cycle...

Jim Klimov has already posted a way better guide, which rebuilds the
pool using the old one's data, so yeah, the replace route I recommended
here is rendered moot.

--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Johannes Totz

On 15/06/2012 13:22, Sašo Kiselkov wrote:
> On 06/15/2012 02:14 PM, Hans J Albertsson wrote:
>> I've got my root pool on a mirror on 2 512 byte blocksize disks. I
>> want to move the root pool to two 2 TB disks with 4k blocks. The
>> server only has room for two disks. I do have an esata connector,
>> though, and a suitable external cabinet for connecting one extra disk.
>> 
>> How would I go about migrating/expanding the root pool to the
>> larger disks so I can then use the larger disks for booting?
>> I have no extra machine to use.
> 
> Suppose we call the disks like so:
> 
>   A, B: your old 512-block drives
>   X, Y: your new 2TB drives
> 
> The easiest way would be to simply:
> 
> 1) zpool set autoexpand=on rpool
> 2) offline the A drive
> 3) physically replace it with the X drive
> 4) do a "zpool replace" on it and wait for it to resilver

When sector size differs, attaching it is going to fail (at least on fbsd).
You might not get around a send-receive cycle...

> 5) offline the B drive
> 6) physically replace it with the Y drive
> 7) do a "zpool replace" on it and wait for it to resilver
> 
> At this point, you should have a 2TB rpool (thanks to the
> "autoexpand=on" in step 1). Unfortunately, to my knowledge, there is no
> way to convert a bshift=9 pool (512 byte sectors) to a bshift=13 pool
> (4k sectors). Perhaps some great ZFS guru can shed more light on this.
> 
> --
> Saso
> 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov


2012-06-15 17:18, Jim Klimov wrote:

7) If you're on live media, try to rename the new "rpool2" to
become "rpool", i.e.:
# zpool export rpool2
# zpool export rpool
# zpool import -N rpool rpool2
# zpool export rpool


Ooops, bad typo in third line; should be:

 # zpool export rpool2
 # zpool export rpool
 # zpool import -N rpool2 rpool
 # zpool export rpool

Sorry,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov


2012-06-15 16:14, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector,
though, and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger
disks so I can then use the larger disks for booting?

I have no extra machine to use.


I think this question was recently asked and discussed on another list;
my suggestion would be more low-level than that suggested by others:

0) Boot from a LiveCD/LiveUSB so that your rpool's environment
   doesn't change during the migration, and so that you can
   ultimately rename your new rpool to its old name.
   It is not fatal if you don't use a LiveMedia environment,
   but it can be problematic to rename a running rpool, and
   some of your programs might depend on its known name as
   recorded in some config file or service properties.

1) Break the existing mirror, reducing it to a single-disk pool

2) Install the new disk, slice it, create an "rpool2" on it.
   NOTE that you might not want all 2TB to be the "rpool2",
   but rather you might dedicate several tens of GBs to
   a root-pool partition or slice, and store the rest as a
   data pool - perhaps implemented with different choices
   on caching, dedup, etc.
   NOTE also that you might need to apply some tricks to
   enforce that the new pool uses ashift=12 if that (4KB)
   is your hardware native sector size. We had some info
   recently on the mailing lists and carried that over to
   illumos wiki: 
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks


3) # zfs snapshot -r rpool@20120615-preMigration
4) # zfs send -R rpool@20120615-preMigration | \
 zfs recv -vFd rpool2
   NOTE this assumes you do want the whole old rpool into rpool2.
   If you decide you want something on a data pool, i.e. the
   "/export/*" datasets - you'd have to make that pool and send
   the datasets there in a similar manner, and send the root pool
   datasets not in one recursive command, but in several sets i.e.
   for rpool/ROOT and rpool/swap and rpool/dump in the default
   layout.

5) # zpool get all rpool
   # zpool get all rpool2

   Compare the pool settings. Carry over the "local" changes with
   # zpool set property=value rpool2
   You'll likely change bootfs, failmode, maybe some others.

6) installgrub onto the new disk so it becomes bootable

7) If you're on live media, try to rename the new "rpool2" to
   become "rpool", i.e.:
   # zpool export rpool2
   # zpool export rpool
   # zpool import -N rpool rpool2
   # zpool export rpool

8) Reboot, disconnecting your remaining old disk, and hope that
   the new pool boots okay. It should ;)
   When it's ok, attach the second new disk to the system and
   slice it similarly (prtvtoc|fmthard usually helps, google it).
   Then attach the new second disk's slices to your new rpool
   (and data pool if you've made one), installgrub onto the second
   disk - and you're done.

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hans J Albertsson

I suppose I must start by labelling the new disk properly, and give the s0 
partition to zpool, so the new zpool can be booted?




Skickat från min Android Mobil"Hung-Sheng Tsao Ph.D."  skrev:
one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not sure  
which version of "solaris" or "openSolaris" ypu are using the procedure may be 
different depend on version"
5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:
I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, though, 
and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger disks so I 
can then use the larger disks for booting?

I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.



one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not 
sure  which version of "solaris" or "openSolaris" ypu are using the 
procedure may be different depend on version"

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, 
though, and a suitable external cabinet for connecting one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--


<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Sašo Kiselkov

On 06/15/2012 02:14 PM, Hans J Albertsson wrote:
> I've got my root pool on a mirror on 2 512 byte blocksize disks.
> I want to move the root pool to two 2 TB disks with 4k blocks.
> The server only has room for two disks. I do have an esata connector, though, 
> and a suitable external cabinet for connecting one extra disk.
> 
> How would I go about migrating/expanding the root pool to the larger disks so 
> I can then use the larger disks for booting?
> 
> I have no extra machine to use.

Suppose we call the disks like so:

  A, B: your old 512-block drives
  X, Y: your new 2TB drives

The easiest way would be to simply:

1) zpool set autoexpand=on rpool
2) offline the A drive
3) physically replace it with the X drive
4) do a "zpool replace" on it and wait for it to resilver
5) offline the B drive
6) physically replace it with the Y drive
7) do a "zpool replace" on it and wait for it to resilver

At this point, you should have a 2TB rpool (thanks to the
"autoexpand=on" in step 1). Unfortunately, to my knowledge, there is no
way to convert a bshift=9 pool (512 byte sectors) to a bshift=13 pool
(4k sectors). Perhaps some great ZFS guru can shed more light on this.

--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hans J Albertsson

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, though, 
and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger disks so I 
can then use the larger disks for booting?

I have no extra machine to use.



Skickat från min Android Mobil___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Stefan Ring

>> Have you also mounted the broken image as /dev/lofi/2?
>
> Yep.

Wouldn't it be better to just remove the corrupted device? This worked
just fine in my case.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Salvaging ZFS data

2012-06-15 Thread Gerrit Tamboer


Hello!

Unfortunately on one of our Areca RAID controllers has encountered a 
power failure which corrupted our zpool and partitions.
We have tried to assemble some new headers but it looks like not only 
the headers/uberblocks but also the MOS has been damaged.


We now have moved on from trying to repair the partition to salvage the 
non-damaged data of it. I have read all documentation I have found 
thoroughly and decided to do the following;
Search for meta-data of files by locating the ZAP object magic-number 
(0x2F52AB2AB), from there assemble the meta-data and eventually gather 
the data attached. For now I have one question. The zap_phys_t data 
structure (described in ZFS On Disk Specifications by SUN), does that 
128KB structure reside INSIDE the dn_bonus of the corresponding 
dnode_phys_t ? I seem to misunderstand the link between the 2 structures.


Thanks in advance!

Regards,
Gerrit

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Scott Aitken

On Fri, Jun 15, 2012 at 07:37:50AM +0200, Stefan Ring wrote:
> > root@solaris-01:/mnt# ??zpool import -d /dev/lofi
> > ??pool: ZP-8T-RZ1-01
> > ?? ??id: 9952605666247778346
> > ??state: FAULTED
> > status: One or more devices contains corrupted data.
> > action: The pool cannot be imported due to damaged devices or data.
> > ?? see: http://www.sun.com/msg/ZFS-8000-5E
> > config:
> >
> > ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
> > ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??ONLINE
> > ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??corrupted data
> > ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
> > ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
> > ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
> > ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
> 
> Have you also mounted the broken image as /dev/lofi/2?


Yep.  I first ran:

for foo in WCAZA1217278 WCAZA1262989 WCAZA1447179 WCAZA1583652 WCAZA1589216 ; \
do lofiadm -a $foo ; done 

(the WC* are the file names of each disk image).

root@solaris-01:/# ls -al /dev/lofi
total 21
drwxr-xr-x   7 root root   7 Jun 14 22:06 .
drwxr-xr-x 246 root sys  246 Jun 14 21:49 ..
lrwxrwxrwx   1 root root  29 Jun 14 22:06 1 -> 
../../devices/pseudo/lofi@0:1
lrwxrwxrwx   1 root root  29 Jun 14 22:06 2 -> 
../../devices/pseudo/lofi@0:2
lrwxrwxrwx   1 root root  29 Jun 14 22:06 3 -> 
../../devices/pseudo/lofi@0:3
lrwxrwxrwx   1 root root  29 Jun 14 22:06 4 -> 
../../devices/pseudo/lofi@0:4
lrwxrwxrwx   1 root root  29 Jun 14 22:06 5 -> 
../../devices/pseudo/lofi@0:5

Clearly there's a disk with an incorrect label.  But how I can reconstruct
that label is a problem.

Also, there are four drives of the five-drive RAIDZ available.  Based on what
criteria does ZFS decide that it is FAULTED and not DEGRADED?  Odd.

Thanks,
Scott

ps I'm downloading OpenIndiana now.

> 
> When I try to recreate your situation, it looks like this (as
> expected), where /dev/lofi/2 is just not present:
> 
> $ lofiadm
> Block Device File   Options
> /dev/lofi/1  /dpool/dump/temp/watched/raid1 -
> /dev/lofi/3  /dpool/dump/temp/watched/raid3 -
> /dev/lofi/4  /dpool/dump/temp/watched/raid4 -
> 
> $ sudo zpool import -d /dev/lofi
>pool: lpool
>  id: 12540294359519404167
>   state: DEGRADED
>  status: One or more devices are missing from the system.
>  action: The pool can be imported despite missing or damaged devices.  The
> fault tolerance of the pool may be compromised if imported.
>see: http://illumos.org/msg/ZFS-8000-2Q
>  config:
> 
> lpoolDEGRADED
>   raidz1-0   DEGRADED
> /dev/lofi/1  ONLINE
> /dev/lofi/2  UNAVAIL  cannot open
> /dev/lofi/3  ONLINE
> /dev/lofi/4  ONLINE
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

[zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

[zfs-discuss] Salvaging ZFS data

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

25 matches

Site Navigation

Mail list logo

Footer information