Re: [zfs-discuss] Mount External USB cdrom on zfs

2009-01-27 Thread Johan Hartzenberg
On Tue, Jan 27, 2009 at 9:49 AM, iman habibi  wrote:

> Dear support
> when i connect my external usb dvdrom to the sparc machine which has
> installed solaris 10u6 based zfs file system,,it return this error:
> bash-3.00# mount /dev/dsk/c1t0d0s0 /dvd/
> Jan 27 11:08:41 global ufs: NOTICE: mount: not a UFS magic number (0x0)
> mount: /dev/dsk/c1t0d0s0 is not this fstype
On Solaris, by default mount assumes that the file system type to be mounted
is UFS.

Basically, when mounting anything other than UFS, you need to specify what
it is.  The two exceptions are:
a) When the vfstab can give information about what file system type to
expect, or
b) When using zfs mount (which only mounts zfs file systems)

So essentially you need to specify the file system type on the mount
command, like this:

mount -F hsfs -r /dev/dsk/c1t0d0s0 /dvd/

The -r is for read-only.

You can also (optionally) add a line to your /etc/vfstab file, like this:
/dev/dsk/c1t0d0s0 - /dvd hsfs - no ro

With this in place you can then mount the disk using:

mount /dvd

(It will learn the device, read-only flag, and the file system type from
/etc/vfstab automatically)

Of course I am wondering why you don't use the auto-mounter.

There are of course other things you could do.  You could change the
"default" file system type in /etc/default/fs, but that is not recommended.
You could write a little "script" to mount disks.  etc etc etc.

For more info, read "man mount" and "man vfstab"


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Can I create ZPOOL with missing disks?

2009-01-16 Thread Johan Hartzenberg
On Thu, Jan 15, 2009 at 5:18 PM, Jim Klimov  wrote:

> Usecase scenario:
> I have a single server (or home workstation) with 4 HDD bays, sold with 2
> drives.
> Initially the system was set up with a ZFS mirror for data slices. Now we
> got 2
> more drives and want to replace the mirror with a larger RAIDZ2 set (say I
> don't
> want a RAID10 which is trivial to make).
> Technically I think that it should be possible to force creation of a
> degraded
> raidz2 array with two actual drives and two missing drives. Then I'd copy
> data
> from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs
> recv),
> destroy the mirror pool and attach its two drives to "repair" the raidz2
> pool.
1. Buy, borrow or steal two External USB disk enclosures (if you don't have
2. Install two new disks internally, and connect the other two via the USB
external enclosures.
3. Set up the zpool
4. Copy the data over.
5. Export both pool.
6. Shut Down
7. Remove the two old disks
8. Move the two disks from the External USB enclosures into the system
9. Start back up, and ...
10. Import the new pool.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] mirror rpool

2009-01-14 Thread Johan Hartzenberg
On Wed, Jan 14, 2009 at 10:58 AM, mijenix  wrote:

> yes, that's the way zpool likes it
> I think I've to understand how (Open)Solaris create disks or how
> the partition thing works under OSol. Do you know any guide or howto?

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card

2009-01-14 Thread Johan Hartzenberg
There is an update in build 105, but it is only pertaining to the Raid
Management tool:

 Issues Resolved:
raid management util doesn't work on solaris
Files Changed: 

On Wed, Jan 14, 2009 at 1:17 PM, Orvar Korvar <> wrote:

> Ive read about some Areca bug(?) being fixed in SXCE b105?
> --
> This message posted from
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2009-01-13 Thread Johan Hartzenberg
On Fri, Jan 9, 2009 at 11:51 AM, Johan Hartzenberg wrote:

> I have this situation working and use my "shared" pool between Linux and
> Solaris.  Note:  The shared pool needs to reside on a whole physical disk or
> on a primary fdisk partition, Unless something changed since I last checked,
> Solaris' support for Logical Partitions are... not quite there yet.
I just chanced apon the following in the SNV Build105 Change logs:

 PSARC case 2006/379 : Solaris on Extended partition
partitions need to be supported on Solaris
UNUSED in fdisk.h needs to be changed since id 100 is Novell Netware 286's
partition ID
to differentiate between solaris old partition and Linux swap
can be created using fdisk table with invalid partition line by "fdisk -F"
extended partition can be created by "fdisk -A"
Files Changed: 

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS capable GRUB install from within Linux?

2009-01-12 Thread Johan Hartzenberg
On Tue, Dec 30, 2008 at 8:28 PM, David Abrahams  wrote:

> FWIW, I managed to build a source merge of the solaris grub-0.97 (with
> ZFS capability) and ubuntu's latest copy of grub-0.97 (with whatever
> patches they've backported into it).  The sources are available at
> Of
> course, I'm not sure yet whether that's enough to boot linux from ZFS.

In addition, you would have to build a Linux Miniroot (or whatever it is
called) which supports ZFS.

I've seen posts about work in this regard by early explorers a long time
ago, but thought I'd wait till a few people actually got it to work before I
looked at it any more!

I quick google found:

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2009-01-09 Thread Johan Hartzenberg
On Fri, Jan 9, 2009 at 6:25 PM, noz  wrote:

> > The above is very dangerous, if it
> > will even work. The output of the zfs send is
> > redirected to /tmp, which is a ramdisk.  If you
> > have enough space (RAM + Swap), it will work, but if
> > there is a reboot or crash before the zfs receive
> > completes then everything is gone.
> > In stead, do the following:
> > (2) n...@holodeck:~# zfs snapshot -r rpool/exp...@now
> > (3) n...@holodeck:~# zfs send -R rpool/exp...@now | zfs recv -d epool
> > (4) Check that all the data looks OK in epool
> > (5) n...@holodeck:~# zfs destroy -r -f rpool/export
> Thanks for the tip.  Is there an easy way to do your revised step 4?  Can I
> use a diff or something similar?  e.g.  diff rpool/export epool/export

Personally I would just browse around the structure, open a few files at
random, and consider it done.  But that is me, and my data, of which I _DO_
make backups.

You could use find to create an index of all the files and save these in
files, and compare those.  Depending on exactly how you do the find, you
might be able to just diff the files.

Of course if you want to be realy pedantic, you would do
cd /rpool/export; find . | xargs cksum > /rpool_checksums
cd /epool/export; find  . | xargs cksum > /epool_checksums
diff /?pool_checksums

But be prepared to wait a very very very long time for the two checksum
processes to run.  Unless you have very little data.


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2009-01-09 Thread Johan Hartzenberg
On Fri, Jan 9, 2009 at 9:55 AM, hardware technician wrote:

> I want to create a separate home, shared, read/write zfs partition on a
> tri-boot OpenSolaris, Ubuntu, and CentOS system.  I have successfully
> created and exported the zpools that I would like to use, in Ubuntu using
> zfs-fuse.  However, I boot into OpenSolaris, and I type zpool import with no
> options.  The only pool I see to import is on the primary partition, and I
> haven't been able to see or import the pool that is on the extended
> partition.  I have tried importing using the name, and ID.
> In OpenSolaris /dev/dsk/c3d0 shows 15 slices, so I think the slices are
> there, but then I type format, select the disk, and the partition option,
> but it doesn't show (zfs) partitions from linux.  In format, the fdisk
> option recognizes the (zfs) linux partitions.  The partition that I was able
> to import is on the first partition, and is named c3d0p1, and is not a
> slice.
> Are there any ideas how I could import the other pool?

I have this situation working and use my "shared" pool between Linux and
Solaris.  Note:  The shared pool needs to reside on a whole physical disk or
on a primary fdisk partition, Unless something changed since I last checked,
Solaris' support for Logical Partitions are... not quite there yet.

P.S. I blogged about my setup (Linux + Solaris with a Shared ZFS pool) here ...  However
this was a long time ago and I don't know whether the statement about Grub
ZFS support in point 3 is still true.

Aparently some bugs pertaining to time stomping between ubuntu and solaris
has been fixed, so you may not need to do step 4. An Alternative to step 4
is to run this in Solaris: pfexec /usr/sbin/rtc -z UTC

In addition, at point nr 7, use "bootadm list-menu" to find out where
Solaris has decided to save the grub menu.lst file.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2009-01-09 Thread Johan Hartzenberg
On Fri, Jan 9, 2009 at 4:10 AM, noz  wrote:

> Here's my solution:
> (1) n...@holodeck:~# zpool create epool mirror c4t1d0 c4t2d0 c4t3d0
> n...@holodeck:~# zfs list
> epool 69K  15.6G18K  /epool
> rpool   3.68G  11.9G72K  /rpool
> rpool/ROOT  2.81G  11.9G18K  legacy
> rpool/ROOT/opensolaris  2.81G  11.9G  2.68G  /
> rpool/dump   383M  11.9G   383M  -
> rpool/export 632K  11.9G19K  /export
> rpool/export/home612K  11.9G19K  /export/home
> rpool/export/home/noz594K  11.9G   594K  /export/home/noz
> rpool/swap   512M  12.4G  21.1M  -
> n...@holodeck:~#
> (2) n...@holodeck:~# zfs snapshot -r rpool/exp...@now
> (3) n...@holodeck:~# zfs send -R rpool/exp...@now > /tmp/export_now
> (4) n...@holodeck:~# zfs destroy -r -f rpool/export
> (5) n...@holodeck:~# zfs recv -d epool < /tmp/export_now
> The above is very dangerous, if it will even work.

The output of the zfs send is redirected to /tmp, which is a ramdisk.  If
you have enough space (RAM + Swap), it will work, but if there is a reboot
or crash before the zfs receive completes then everything is gone.

In stead, do the following:
(2) n...@holodeck:~# zfs snapshot -r rpool/exp...@now
(3) n...@holodeck:~# zfs send -R rpool/exp...@now | zfs recv -d epool
(4) Check that all the data looks OK in epool
(5) n...@holodeck:~# zfs destroy -r -f rpool/export

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS Import Problem

2008-12-30 Thread Johan Hartzenberg
On Tue, Dec 30, 2008 at 3:32 PM, Weldon S Godfrey 3 wrote:

> If memory serves me right, sometime around 12:34am, Michael McKnight told
> me:
> >
> > I have tried import -f, import -d, import -f -d ... nothing works.
> >
> Did you try zpool export 1st?

He did say he was doing zpool replace commands when "it went downhill"

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2008-12-29 Thread Johan Hartzenberg
On Mon, Dec 29, 2008 at 1:12 AM, scott  wrote:

> thanks for the input. since i have no interest in multibooting (virtualbox
> will suit my needs), i created a 10gb partition on my 500gb drive for
> opensolaris and reserved the rest for files (130gb worth).
> after installing the os and fdisking the rest of the space to solaris2, i
> created a zpool called DOCUMENTS (good tips with the upper case), which i
> then mounted to Documents in my home folder.
> the logic is, if i have to reinstall, i just export DOCUMENTS and re-import
> it into the reinstalled os (or import -f in a worst-case scenario).
> after having done all the setup, i partitioned drive 2 using identical
> cylinder locs and mirrored each into their respective pools (rpool and
> DOCUMENTS). replacing drive 1 with 2 and starting back up, everything boots
> fine and i see all my data, so it worked.
> obviously i'm a noob, and yet even i find my own method a little
> suspicious. i look at the disk usage analyzer and see that / is 100% used.
> while i'm sure that this is in some kind of "virtual" sense, it leaves me
> with a feeling that i've done a goofy thing.
> comments about this last concern are greatly appreciated!

Firstly, 10 GB is a bit on the lean side for a Solaris root pool. The pool
needs to store about 6GB of software, a Swap Device, and a Dump Device.

OpenSolaris also gives you upgrade with roll-back.  For this purpose I
reserved about 8 GB per "instance".  The way I do it is as follow:

8 GB for the current version
8 GB for current - 1.
8 GB for a "transient" version - see below
6 GB for Swap and Dump.
10 GB for some flexibility, installing software, etc.
Total for Solaris partition: 40 GB

The transient instance does not stay on the disk for long.  The upgrade
strategy is as follow:

When running on version N, and upgrading to N+1, you will still have N-1 on
disk.  Thus, space for 3 releases is needed.  A few days after upgrading to
N+1, I start to consider it to be the new N.  The old N-1 is then redundant,
and I delete it at that point.  The exception is if the new release doesn't
work to my liking.  Then I delete it, and keep the old N and N-1.

I ALWAYS keep one older release on disk - if nothing else, I've had to use
it as a "recovery" environment many times.  However, it is a somewhat
"expensive" recovery area: In particular, I am using Solaris Express.  It is
possible to create a "recovery" alternate boot environment as follow:

Create a new boot environment (lucreate -n recovery)
Make it bootable (lucativate recovery)
Boot into it once (init 6)
Make the "old BE" active again and boot back into it.

The result is a recovery environment from which you can boot, which does not
take any disk space (other than whatever changes on disk) because it is
based on a snapshot of the existing/current boot environment.

I don't know the OpenSolaris upgrade mechanism yet, though I understand that
something similar is possible.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] separate home "partition"?

2008-12-27 Thread Johan Hartzenberg
On Sat, Dec 27, 2008 at 7:33 AM, scott  wrote:

> do you mean a pool on a SEPARATE partition?
> --
That is what I do.  In particular, I have:

fdisk partition 1 = Solaris partition type 0xbf  = rpool = 40 GB
fdisk partition 2 = MSDOS partition type = SHARED zpool = 190 GB
fdisk partition 3 = 30 GB Extended partition. Logical partition 5 used for
Ubuntu Root, Logical Partition 6 = Ubuntu Swap.

This leaves me with the option of creating an fdisk partition 4 for another
operating system.

1. Partitioning means ZFS does not turn on write-caching.
2. Also there is "wasted space". (Partitioning implies pre-allocating space,
which means you have to dedicate space that you may not use)

1. I can import the SHARED zpool under Ubuntu and thus I have the perfect
shared space solution between the two operating systems, without having to
worry about clashing mount points which would be present if I tried to
import the root pool.
2.  If I needed to re-install, I would only wipe/destroy/touch the OS, not
my user data.

I have not yet made the move from Solaris Express to OpenSolaris, so I am
still using Live Upgrade.  I generally upgrade to every new release,
sometimes to my sorrow.  But it does not touch my "SHARED data" zpool.

One other thing:  I started a "convention" of using all-capital names for my
ZFS pool names.  It makes them stand out nicely in the output of df and
mount, but in particular ir distinguishes nicely between the pool name and
the mountpoint because I then mount the "SHARED" pool on "/shared".

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Possible to switch SATA ports?

2008-12-26 Thread Johan Hartzenberg
On Fri, Dec 26, 2008 at 1:26 PM, Orvar Korvar <> wrote:

> Ok, so I could partition a drive into two parts, and treat each of the
> partitions as one drive? And then I exchange one partition at a time with a
> whole new drive? That sounds neat. I must format the drive into two zfs
> partitions? Or UFS partitions? ZFS doesnt have partitions?
> And another thing, is it better to do a "cp *" or do "zfs send" when
> copying data from old zpool to new zpool? What is the differences?
No, my suggestion is to
1. Connect four of the five 1TB drives to the available SATA ports.
2. Put the 5th one in an external USB enclosure, or find another SATA
3. Then create the new pool of 5x1TB drives.
4. Then ZFS-send the data from the old pool to the new pool
5. Export the old pool, shut down and remove the old disks.
6. Move the disk which is on eSATA, or External USB or wherever to one of
the freed-up USB ports.
7. Start up.  You are done.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Possible to switch SATA ports?

2008-12-25 Thread Johan Hartzenberg
On Wed, Dec 24, 2008 at 2:50 PM, Orvar Korvar <> wrote:

> I have a ZFS raid and wonder if it is possible to move the ZFS raid around
> from SATA port to another? Ive heard that someone assembled the SATA
> connections differently and the ZFS raid wouldnt work.
> Say that I have 8 SATA port controller card with 4 drives in a ZFS raid.
> Sata ports 0-3 are occupied and Sata ports 4-7 are empty. Could I move SATA
> connection nr 0 to the SATA port nr 4?
> --

Best would be if you could find another SATA port, or even an exernal USB
enclosure, to use temporarily, even on any other controller.  Then you can
create the raidz 5x1TB drives, zfs send the data, and then get rid of the
old drives without damaging the old pool.  If anything does go wrong during
the process, your old pool would still be in tact.

Once the data transfer is complete, shut down and remove the old disks and
then connect the drive which was temporarily on another controller onto a
SATA controller as its final configuration.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] How to create a basic new filesystem?

2008-12-23 Thread Johan Hartzenberg
On Sun, Dec 21, 2008 at 8:00 PM, dick hoogendijk  wrote:

> On Sun, 21 Dec 2008 07:36:07 PST
> Uwe Dippel  wrote:
> > [i]If you want to add the entire Solaris partition to the zfs pool as
> > a mirror, use zpool attach -f rpool c1d0s0 c2d0s2[/i]
> >
> > So my mistake in the first place (see first post), in short, was only
> > the last digit: I ought to have used the complete drive (slice 2),
> > instead of *thinking* that it is untouchable, and zfs/zpool would set
> > up s0 properly to be used?
> >
> > Dick, it seems we have to get used to the idea, that slice 2 is
> > touchable, after all.
> That may be, but all my mirror disks are like c0d0s0 c0d1s0. s0 taking
> up the whole disk. On some there is a s2 on some there isn't. Also, SUN
> itself mentions s0 in explaining zfs root as bootable. There is no
> mention of s2. As far as I'm concerned bootable ZFS is on s0;
> non-bootable drives have an EFI label ;-)

I believe there are some bugs at present pertaining to booting form ZFS
which, in that special case, requires you to use slices.

Where I gave examples for adding p0 or s2 to a pool, I was not thinking
about bootable pools.  The ZFS admin guide has got examples on how to add a
mirror to a pool and explains the gotchas / workarounds for these issues,
but from memory you have to read the whole guide to get all the
information.  I have not looked recently to see whether it got updated /

>From memory, the "issues" are related to the start / offset of the pool on
the disk which is added to the pool as a mirror, as well as the "manual"
instalation of the bootblock on the disk added as a mirror.  I can't
remember the details now, but the first requires that a slice be created
manually with the correct offset, rather than using the whole disk.  I have
not tried it yet and don't know the details of how to create this slice, or
how the problem manifests itself.  Right now I don't have time to search for
this information, but if the question is still open by the 10th of January
I'll look into it, though I'm sure someone else here will remember the
details better than me.

So to re-itterate, the exception is when adding disks to a bootable
root-pool as a mirror.  Otherwise, it is simple.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] How to create a basic new filesystem?

2008-12-21 Thread Johan Hartzenberg
On Sun, Dec 21, 2008 at 12:13 PM, dick hoogendijk  wrote:

> On Sat, 20 Dec 2008 17:02:31 PST
> Uwe Dippel  wrote:
> > Now I modified the slice s0, so that is doesn't overlap with s2 (the
> > whole disk) any longer:
> >
> > Part  TagFlag Cylinders SizeBlocks
> >   0   rootwm   3 - 10432  159.80GB(10430/0/0)
> > 335115900 1 unassignedwm   00
> > (0/0/0) 0 2 backupwu   0 - 10441
> > 159.98GB(10442/0/0) 335501460
> As mentioned previously you do not need to fiddle with partitions and
slices if you don't want to use less than the entire disk.

If you want to add the entire Solaris partition to the zfs pool as a mirror,
zpool attach -f rpool c1d0s0 c2d0s2

If you want to add the entire physical disk to the pool as a mirror, use
zpool attach rpool c1d0s0 c2d0p0

If you want to Extend the pool using the space in the entire Solaris
partition, use
zpool add -f rpool c2d0s2

If you want to Extend the pool using the entire physical disk, use
zpool add rpool c2d0p0

The -f to force is required to override the bug about s2 overlapping with
other slices.  The above assume you have not modified s2 to be anything
other than the entire Solaris partition, as is the default.

The only time to use anything other than s2 or p0 is when you specifically
want to use less than the whole partition or disk.  In that case you need to
define slices/partitions.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

[zfs-discuss] Removing Disks from pools

2008-12-18 Thread Johan Hartzenberg
Hello ZFS gurus and fellow fans.

As we all know ZFS does not _yet_ support relayout of pools.  I want to know
whether there is any hope for this to become available in the near future?

>From my outside view it sounds like it should be possible to set a flag to
stop allocating new blocks from a specific device, then start a job to
"touch" each block on the subject device, causing each block to be CoW moved
to one of the other disks until there are no more blocks left on that device
and then finally to clear the device's pool membership status.

Similarly, adding a device into a raid-Z vdev seems easy to do:  All future
writes include that device in the list of devices from which to allocate

But I admit, I am no programmer, so please do enlighten me :-)

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS and aging

2008-12-17 Thread Johan Hartzenberg
On Wed, Dec 17, 2008 at 2:46 PM, Thanos McAtos  wrote:

> My problems are 2:
> 1) I don't know how to properly age a file-system. As already said, I need
> traces of a decade's workload to properly do this, and to the best of my
> knowledge there is no easy way to do this automatically.
> 2) I know very little of ZFS. To be honest, I have no idea what to expect.
> Maybe I'm doing aging the wrong way or ZFS suffers from aging when is has to
> allocate blocks for writes/updates and not on recovery.
> I would expect the fill level of the pool to be a much bigger factor than
the "age" of the file system.  However an old but very empty file system may
have its data blocks spread far apart (large gaps in between).  So a "new"
empty file system may have all its allocated data blocks at the start of a
disk, and a "old" empty file system may be scattered all over the disk.
However, since we are talking about more space than data, and ZFS only
"rebuilds" the blocks which are in use, this is a special case and while the
difference my be relatively large, it will likely be small real difference.

But I am speculating.  The CoW nature of ZFS will probably make it very hard
to consistently create a "fragmented" file system!!!

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Johan Hartzenberg
On Tue, Dec 16, 2008 at 1:43 PM,  wrote:

> >When current uber-block A is detected to point to a corrupted on-disk
> data,
> >how would "zpool import" (or any other tool for that matter) quickly and
> >safely know that, once it found an older uber-block "B" that it points to
> a
> >set of blocks which does not include any blocks that has since been freed
> >and re-allocated and, thus, corrupted?  Eg, without scanning the entire
> >on-disk structure?
> Without a scrub, you mean?
> Not possible, except the first few uberblocks (blocks aren't used until a
> few uberblocks later)
> Casper

Does that mean that each of the last "few-minus-1" uberblocks point to a
consistent version of the file system? Does "few" have a definition?

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Need Help Invalidating Uberblock

2008-12-16 Thread Johan Hartzenberg
On Tue, Dec 16, 2008 at 11:39 AM, Ross  wrote:

> I know Eric mentioned the possibility of zpool import doing more of this
> kind of thing, and he said that it's current inability to do this will be
> fixed, but I don't know if it's an official project, RFE or bug.  Can
> anybody shed some light on this?
> See Jeff's post on Oct 10, and Eric's follow up later that day in this
> thread:
> --

When current uber-block A is detected to point to a corrupted on-disk data,
how would "zpool import" (or any other tool for that matter) quickly and
safely know that, once it found an older uber-block "B" that it points to a
set of blocks which does not include any blocks that has since been freed
and re-allocated and, thus, corrupted?  Eg, without scanning the entire
on-disk structure?

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS and aging

2008-12-16 Thread Johan Hartzenberg
On Mon, Dec 15, 2008 at 7:57 PM, Thanos McAtos  wrote:

> Hello all.
> I'm doing a course project to evaluate recovery time of RAID-Z.
> One of my tests is to examine the impact of aging on recovery speed.
> I've used PostMark to stress the file-system but I didn't observe any
> noticeable slowdown.
> Is there a better way to "age" a ZFS file-system?
> Does ZFS have aging issues at all?
> Thanx in advance.

Anton suggested some practical methods for conducting testing.  But do
follow proper testing procedure:

Since you're doing this as part of a study course you may already know much
of this, but I've just seen too many invalid, useless test to let this one

So firstly, understand that utilization metrics are not performance.  At
best they can be considered a symptom.  However utilization metrics is
important because it may provide hints at a) how to improve the system's
performance, and b) errors in the thinking during the test design phase.

Equally important: Know what your testing objectives are.  Define important
concepts, such as "recovery time" and "file system age".

Document your test methodology, expected results, and make a list of
scenarios, including a "base-line" for comparison, and a description of what
will remain the same and what will be different between the test scenarios.

For each test, record all utilization metrics so that you can evaluate these
to understand what the bottleneck (bound resource) was in each scenario.

Record the results for each test scenario.  Include the recorded utilization
data in an appendix.

Make some conclusions.  This is where definitions are important.  For
example saying that "file system age made [no] significant impact on raid-z
recovery time" is completely meaningless unless you also defined "file
system age" and "recovery time"

The big issue is that everybody has got a case of "X performed better than
Z, thus X is better than Z"  where X and Z are simple products.  You really
have to accurately describe your test scenarios, especially in terms of what
are different and what are the same between them.  Try to be as complete as
possible.  Include the scripts and their parameters as you used them to
generate load, if possible (Not possible when you let users generate real

So some items to list in your scenarios:
o System configuration details.
o OS version, patches
o Software versions (patch revisions, etc)
o Configuration details (at least anything which is non-default.  In many
case you may want to specifically stress some default values)
o The exact test procedure, parameters, etc.

So, while talking about Raid-Z recovery time, particularly in terms of the
File System "age", I imagine some kind of comparison of recovery times.  I
am sure you will design a series of increasingly "aged" storage pools, and
for each perform a number of "recovery test" for which you will record the
run time.

What would be your baseline?  What do you want to keep constant between the
tests?  Nr of files?  File system usage level?  Nr of disks in the pool?  I
am assuming that not changing the system configuration and patch level
between the test scenarios are obvious.  I am also assuming that system load
will be idle for all tests?

In terms of evaluating the results:  Do you expect the file system age to
actually impact on the recovery time?  If so, is this based on how file
system age impacts on recovery time for other raid technologies?

If your test results will be published I'd love to take a look at it.

Good luck

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-12 Thread Johan Hartzenberg
On Fri, Dec 12, 2008 at 10:10 PM, Miles Nordin  wrote:

> 0. The reports I read were not useless in the way some have stated,
>   because for example Mike sampled his own observations:


> I don't see when the single-LUN SAN corruption problems were fixed.  I
> think the supposed ``silent FC bit flipping'' basis for the ``use
> multiple SAN LUN's'' best-practice is revoltingly dishonest, that we
> _know_ better.  I'm not saying devices aren't guilty---Sun's sun4v IO
> virtualizer was documented as guilty of ignoring cache flushes to
> inflate performance just like the loomingly-unnamed models of lying
> SATA drives:
> Is a storage-stack-related version this problem the cause of lost
> single-LUN SAN pools?  maybe, maybe not, but either way we need an
> end-to-end solution.  I don't currently see an end-to-end solution to
> this pervasive blame-the-device mantra every time a pool goes bad.
> I keep digging through the archives to post messages like this because
> I feel like everyone only wants to have happy memories, and that it's
> going to bring about a sad end.

Thank you.

There is so much unsupported claims and noise on both sides that everybody
is sounding like a bunch of fanboys.

The only bit that I understand about why HW raid "might" be bad is that if
it had access to the disks behind a HW RAID LUN, then _IF_ zfs were to
encounter corrupted data in a read, it will probably be able to re-construct
that data.  This is at the cost of doing the parity calculations on a
general purpose CPU, and then sending that parity data, as well as the data
to write, across the wire.  Some of that cost may be offset against Raid-Z's
optimizations over raid-5 in some situations, but all of this is pretty much
if-then-maybe type situations.

I also understand that HW raid arrays have some vulnerabilities and
weaknesses, but those seem to be offset against ZFS' notorious instability
during error conditions.  I say notorious, because of all the open bug
reports and reports on the list of I/O hanging and/or systems panicing while
waiting for ZFS to realize that something has gone wrong.

I think if this last point can be addressed - make ZFS respond MUCH faster
to failures, then it will go a long way to make ZFS  be more readily

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450

2008-12-07 Thread Johan Hartzenberg
On Wed, Dec 3, 2008 at 6:37 PM, Aaron Blew <[EMAIL PROTECTED]> wrote:

> I've done some basic testing with a X4150 machine using 6 disks in a RAID 5
> and RAID Z configuration.  They perform very similarly, but RAIDZ definitely
> has more system overhead.  In many cases this won't be a big deal, but if
> you need as many CPU cycles as you can muster, hardware RAID may be your
> better choice.

Some people keep stressing the point that HW raid does not include snapshots
or what ever other features, or does so at cost, or ... or ... or .  It
seems to me like we assume that the above poster intended or implied the use
of another file system on the HW raid system.

The poster above did not specify a file system, so I may as well assume the
comparisons is between using ZFS with JBOD vs ZFS on HW-raid.

Then the features available to the administrator are essentially the same.
Not the question becomes: What are the pros and cons for each?

I have not tested this, but I would assume that the HW raid (forget about
cheap motherboard chipset integrated "fake-raid") will save some CPU time
because the raid controller has got a dedicated processor to do the stripe
parity calculations.  In addition the ZFS routines may have an easier time
ITO selecting which disk to store the data on (only one disk to choose

On the other hand, ZFS promises better fault detection, but presently this
is temptered by several open bugs against ZFS during situations where
degraded pools are present, eg pools freezing, etc.  HW raid seem to have
this sort of situation under control.

Some HW raids may offer re-layout without losing data.  ZFS does not (yet)
offer this.

ZFS claims better write performance in scenarios where less than a full
stripe width is updated, and raid5 suffers from the "write-hole" problem.
Nicely defined here:

ZFS updates are "atomic" - you never need to fsck the file system.

ZFS will work regardless of whether or not you have a HW raid disk

So... what other benefits has ZFS got (as defined in my second paragraph)

For what it is worth, have a look at my ZFS feature wishlist / AKA what it
would take to make ZFS _THE_ last word in storage management:


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-16 Thread Johan Hartzenberg
On Sun, Nov 16, 2008 at 11:44 PM, Jeff Bonwick <[EMAIL PROTECTED]> wrote:

> These are the conditions:
> (1) The bug is specific to the root pool.  Other pools are unaffected.
> (2) It is triggered by doing a 'zpool online' while I/O is in flight.
> (3) Item (2) can be triggered by syseventd.
> (4) The bug is new in build 102.  Builds 101 and earlier are fine.
> I believe the following should be a viable workaround until build 103:
> (1) svcadm disable -t sysevent
> (2) Don't run zpool online on your root pool
> Jeff

Hi Jeff,

Thank you for the details.  A few more questions:  Does booting into build
102 do I zpool online on the root pool? And the above disable -t is
"temporary" till the next reboot - any specific reason for doing it that
way?  And last question:  What do I loose when I disable "sysevent"?

Thank you,

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-15 Thread Johan Hartzenberg
On Sat, Nov 15, 2008 at 10:57 AM, Vincent Boisard <[EMAIL PROTECTED]>wrote:

>> OTOH - if you don't know OpenSolaris well enough, you're better off
>> either picking an earlier release that has proven to have very few
>> relevant warts - usually based on a recommendation for other, more
>> experieced, users.  Or you could go with the commercial, rock solid
>> release called Solaris U6 (Update 6) recently released.
> Where can I find advice on these earlier versions "with few relevant
> warts". When I look at forums, I see good and bad for each release. Also,
> S10U6 does not have features that I need (Zones ZFS cloning). Also, as I
> have no support contract with sun (home user), I am not sure if I will get
> patches or not.
If Zone Cloning via ZFS snapshots is the only feature you miss in S10u6,
then you should reconsider.  Writing a script to implement this yourself
will require only a little experimentation.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] 10u6 any patches yet?

2008-11-12 Thread Johan Hartzenberg
On Wed, Nov 12, 2008 at 8:15 PM, Vincent Fox <[EMAIL PROTECTED]>wrote:

> Just wondering if anyone knows of a patch released for 10u6?
> I realize this is OT but want to test my new ability with ZFS root to do
> lucreate, patch the alternate BE, and luactivate it.

Send me an explorer and I will run a patch report for you.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] zfs (u)mount conundrum with non-existent mountpoint

2008-11-06 Thread Johan Hartzenberg
On Thu, Nov 6, 2008 at 8:22 PM, Michael Schuster

> Mark J Musante wrote:
> >
> > Hi Michael,
> >
> > Did you try doing an export/import of tank?
> no - that would make it unavailable for use right? I don't think I can
> (easily) do that during production hours.

Can you please post the output from:
zfs get all tank/schuster/ilb

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Enabling load balance with zfs

2008-10-31 Thread Johan Hartzenberg
On Thu, Oct 30, 2008 at 4:42 PM, Brian Hechinger <[EMAIL PROTECTED]> wrote:

> If what the OP is looking for is redundant but not nessesarily exact copies
> (he
> only wants the last X days on the backup disk, for example) he may want to
> consider
> looking into SAM.

The OP described a solution where if one disk fails, only SOME of the files
are lost, thus a Non-Redundant solution.  I Quote:

I would like to set this pair of disk as a zpool, and would like to spread
>> one file to one disk, and the other file to the other disk.
>> Is it possible to configure the zpool, with both disks, and set the file
>> balancing?
>> I want this configuration to keep some files if one of the disks is
>> distroyed (for some reason).

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Enabling load balance with zfs

2008-10-30 Thread Johan Hartzenberg
On Thu, Oct 30, 2008 at 12:13 PM, Sergio Arroutbi <[EMAIL PROTECTED]>wrote:

> My point is that I want to use the same directory in the recorder program.
> I get the streaming, and start writing the file in /mnt/streamingDirectory
> So I would like to record in the same way, just configuring (via zfs if
> possible), that one file should go to /dev/sda and the other file to the
> /dev/sdb disk (I am using this in Linux via fuse).
This is interesting to me!  What fuse file system allows you to spread a
single directory (file system) across two disks in a non-redundant manner
but not loose access to the file system if one of the disks fail?

My suggestion:
Create two ZFS pools, and mount them on different directories, for example

Then write a script which will do the following:
Start up periodically.
If new files exist in /mnt/streamingDirectory, copy them alternatingly (is
that a word) to /mnt/storage_a and /mnt/storage_b

Something like:
# use mkdir as a lock/test since the kernel will give us automatic
semaphore, thus we can have mutual-exclusion
mkdir /tmp/task_is_running || exit
# continue where we left off
NEXT_TARGET="$(cat -s /etc/last_target)"
find /mnt/streamingDirectory -type f | while read FILENAME
  [ "$NEXT_TARGET" = "/mnt/storage_a/" ] && NEXT_TARGET=/mnt/storage_b/ ||
  cp $FILENAME ${NEXT_TARGET}/ || exit 1
echo $NEXT_TARGET > /etc/last_target
# On exit without errors, remove the lock.
rmdir /tmp/task_is_running

Sorry I did not test this, it is off the cuff, typoes may exist.  Or logic
errors.  Also note the "find ... | read" will not port well to other shells.


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] DNLC and ARC

2008-10-30 Thread Johan Hartzenberg
DNLC seems to be independent.

>From my laptop, which has only got ZFS file systems (Two ZPOOLs), the stats
$ kstat -n dnlcstats
module: unixinstance: 0
name:   dnlcstats   class:misc
crtime  25.772681029
dir_add_abort   0
dir_add_max 0
dir_add_no_memory   0
dir_cached_current  0
dir_entries_cached_current  0
dir_fini_purge  0
dir_misses  0
dir_reclaim_any 0
dir_remove_entry_fail   0
dir_remove_space_fail   0
dir_start_no_memory 0
dir_update_fail 0
double_enters   256
*enters  29871
hits5057854  <<--- Looks Good!
misses  27737*
negative_cache_hits 88995
pick_free   0
pick_heuristic  0
pick_last   0
purge_all   1
purge_fs1   0
purge_total_entries 22117
purge_vfs   79

On Thu, Oct 30, 2008 at 12:50 PM, Marcelo Leal <

> Hello,
>  In ZFS the DNLC concept is gone, or is in ARC too? I mean, all the cache
> in ZFS is ARC right?
>  I was thinking if we can tune the DNLC in ZFS like in UFS.. if we have too
> *many* files and directories, i guess we can have a better performance
> having all the metadata cached, and that is even more important in NFS
> operations.
>  DNLC is LRU right? And ARC should be totally dynamic, but as in another
> thread here, i think reading a *big* file can mess with the whole thing. Can
> we hold an area in memory for DNLC cache, or that is not the ARC way?
>  thanks,
>  Leal.
> --
> This message posted from
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Verify files' checksums

2008-10-25 Thread Johan Hartzenberg
On Sat, Oct 25, 2008 at 6:49 PM, Marcus Sundman <[EMAIL PROTECTED]> wrote:

> Richard Elling <[EMAIL PROTECTED]> wrote:
> > Marcus Sundman wrote:
> > > How can I verify the checksums for a specific file?
> >
> > ZFS doesn't checksum files.
> AFAIK ZFS checksums all data, including the contents of files.
> > So a file does not have a checksum to verify.
> I wrote "checksums" (plural) for a "file" (singular).

AH - Then you DO mean the ZFS built-in data check-summing - my mistake.  ZFS
checksums allocations (blocks), not files. The checksum for each block is
stored in the parent of that block.  These are not shown to you but you can
"scrub" the pool, which will see zfs run through all the allocations,
checking whether the checksums are valid.

This PDF document is quite old but explains it fairly well:

What is not expressly stated in the block is that the ZFS allocation
structure stores the posix layer and file data in the leaf nodes in the


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Verify files' checksums

2008-10-24 Thread Johan Hartzenberg
On Sat, Oct 25, 2008 at 6:59 AM, Johan Hartzenberg <[EMAIL PROTECTED]>wrote:

> On Sat, Oct 25, 2008 at 4:00 AM, Marcus Sundman <[EMAIL PROTECTED]> wrote:
>> How can I verify the checksums for a specific file?
>> I have a feeling you are not asking the question about ZFS hosted files
> specifically.
> If you downloaded a file, enter
> cksum filename
> To get the "CRC Check-Sum"
> For more types of checksum, you can use
> digest -a md5 filename
> digest -l will list types of checksum that the "digest" command knows
> about.
> Cheers,
>   _hartz
Oh, one other thing,
To check the cheksums of files you've downloaded to a MS Windows system you
need do download and install a "checksum checking" utility, try


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Verify files' checksums

2008-10-24 Thread Johan Hartzenberg
On Sat, Oct 25, 2008 at 4:00 AM, Marcus Sundman <[EMAIL PROTECTED]> wrote:

> How can I verify the checksums for a specific file?
> I have a feeling you are not asking the question about ZFS hosted files

If you downloaded a file, enter
cksum filename

To get the "CRC Check-Sum"

For more types of checksum, you can use

digest -a md5 filename

digest -l will list types of checksum that the "digest" command knows about.


Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] zpool cross mount

2008-10-23 Thread Johan Hartzenberg
On Thu, Oct 23, 2008 at 4:49 PM, Laurent Burnotte

> => is there in zfs an automatic mechanism during solaris 10 boot that
> prevent the import of pool B ( mounted /A/B ) before trying to import A
> pool or do we have to legacy mount and file /etc/vfstab

This is fine if the pool from which /A is mounted is "guaranteed" to be
present, online, and have /A mounted.  Where /A is from the root pool, you
should be safe most of the time.

If not, set  the canmount promptery of the "Pool B /A/B" dataset to noauto,
otherwise it may bet mounted without /A being mounted, which depending on
your situation can be a minor irritation or a serious problem.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] zfs migration question

2008-10-22 Thread Johan Hartzenberg
On Wed, Oct 22, 2008 at 2:35 AM, Dave Bevans <[EMAIL PROTECTED]> wrote:

>  Hi,
> I have a customer with the following question...
> She's trying to combine 2 ZFS 460gb disks into one 900gb ZFS disk. If this
> is possible how is this done? Is there any documentation on this that I can
> provide to them?
There is no way to do this without a backup/restore.

Backup one of the zpools.
Destroy this zpool.
Add the disk to the other (remaining) zpool.  This will make it bigger
Restore the data into your bigger pool.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS boot vs Linux fuse

2008-10-22 Thread Johan Hartzenberg
Reboot to the grub menu
Move to the failsafe kernel entry
tap "e" to edit entry.
go to the kernel entry and tap "e" again
Append -kv to the end of the line
Accept and tap "b" to boot the line.

After some output you will be prompted to mount the root pool on /a - Enter
y to accept.

You will then get a shell prompt.  Reboot and all should be fine.

I actually need to ask a question:
Did the person who imported the pool under Linux use the old (circa Feb
2008) zfs-fuse, or the new one (Sept 2008)?

If the later, then it is possible that they also did a zpool upgrade and now
your Solaris no longer understands the ZFS on-disk format.  If this is the
case, upgrade solaris (Boot from new media, go to the text-more installer,
and select "upgrade" when prompted for the install type).  This will update
Solaris to understand the ZFS version.  I think the older zfs-fuse used to
support ZFS version 8 or 9, the new one supports version 12 or 13.

On Wed, Oct 22, 2008 at 5:15 PM, Andrew Gallatin <[EMAIL PROTECTED]> wrote:

> Hi,
> I have a triple boot amd64 Linux/FreeBSD/OpenSolaris box used for Q/A.   It
> is in a data center where I don't have easy physical access to the machine.
>   It was working fine for months, now I see this at boot time on the serial
> console:
> SunOS Release 5.11 Version snv_86 64-bit
> Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> NOTICE: mount: not a UFS magic number (0x0)
> panic[cpu0]/thread=fbc245a0: cannot mount root path /ramdisk:a
> fbc446d0 genunix:rootconf+113 ()
> fbc44720 genunix:vfs_mountroot+65 ()
> fbc44750 genunix:main+d8 ()
> fbc44760 unix:_locore_start+92 ()
> I suspect the problem was caused when, under Linux, somebody foolishly
> exported then imported the Solaris rootfs using the Linux FUSE ZFS stuff so
> they could pull data off the Solaris side without a reboot.   I guess that
> must have done "something" to the pool so that Solaris no longer likes it.
> The linux ZFS tools list the history of the zpool as:
> History for 'rpool':
> 2008-05-06.08:39:33 zpool create -f rpool_tmp c5t0d0s0
> 2008-05-06.08:39:33 zfs create rpool_tmp/ROOT
> 2008-05-06.08:39:33 zfs set compression=off rpool_tmp/ROOT
> 2008-05-06.08:39:35 zfs set mountpoint=/a/export rpool_tmp/export
> 2008-05-06.08:39:35 zfs set mountpoint=/a/export/home rpool_tmp/export/home
> 2008-05-06.08:51:28 zpool set bootfs=rpool_tmp/ROOT/opensolaris rpool_tmp
> 2008-05-06.08:51:29 zfs set mountpoint=/export/home rpool_tmp/export/home
> 2008-05-06.08:51:29 zfs set mountpoint=/export rpool_tmp/export
> 2008-05-06.08:51:31 zpool export -f rpool_tmp
> 2008-05-06.08:51:38 zpool import -f 2344082471458403555 rpool
> 2008-05-06.08:51:59 zpool set bootfs=rpool/ROOT/opensolaris rpool
> 2008-05-06.08:52:20 zfs snapshot -r [EMAIL PROTECTED]
> 2008-09-07.12:22:55 zpool import -ocachefile=/etc/zfs-cachefile -d
> /tmp/dev/ -f rpool
> 2008-09-07.12:26:00 zpool export rpool
> 2008-09-07.12:34:58 zpool import -d /tmp/dev rpool
> 2008-09-07.09:59:40 zpool import -f rpool
> 2008-09-07.17:20:56 zpool import -d /var/tmp/dev -f rpool
> 2008-09-07.17:21:43 zpool export rpool
> 2008-09-07.17:27:35 zpool import -d /var/tmp/dev/ rpool
> 2008-09-07.17:32:10 zpool export rpool
> 2008-09-07.17:32:23 zpool import -d /var/tmp/dev/ rpool
> 2008-09-07.17:32:40 zpool export rpool
> 2008-09-07.10:41:13 zpool import rpool
> 2008-09-07.11:42:09 zpool export rpool
> 2008-09-07.11:42:24 zpool import rpool
> 2008-09-07.11:45:26 zpool export rpool
> 2008-09-07.18:52:35 zpool import -d /var/tmp/dev rpool
> The entries from 2008-09-07 were operations using the linux tools, prior
> are from the Solaris installation.
> Is there any possible way to rescue the solaris installation remotely,
> using the linux install or via grub or kmdb from the serial console?  How?
> Alternatively, would it be possible to rescue the installation by either
> moving the disk to an OpenSolaris machine (b95) and doing something (what?)?
>  Or by booting via the Indiana installation CD (what?).
> Thanks,
> Drew
> --
> This message posted from
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

Afrikaanse Stap Website:

My blog:

ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk =
[EMAIL PROTECTED], AIM = JohanHartzenberg
zfs-discuss mailing list

Re: [zfs-discuss] am I "screwed"?

2008-10-16 Thread Johan Hartzenberg
On Mon, Oct 13, 2008 at 10:25 PM, dick hoogendijk <[EMAIL PROTECTED]> wrote:

> We have to dig deeper with kmdb. But before we do that, tell me please
> what is an easy way to transfer the messages from the failsafe login on
> the problematic machine to i.e. this S10u5 server. All former screen
> output had to be typed in by hand. I didn't know of another way.
If you say "no" to mount the pool on /a, does it still hang?

Just to ask the obvious question, did you try to press ENTER or anything
else where it was hanging?

What build are you booting into failsafe mode?  Something older, or b99?

Do you have a build-99 DVD to boot from, from which you can get a proper
running system with networking, etc?

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Pros/Cons of multiple zpools?

2008-10-11 Thread Johan Hartzenberg
On Thu, Oct 9, 2008 at 10:28 PM, Joseph Mocker
> wrote:

> If I add the second array to the pool, I could probably continue with the
> same number of columns in the raidz, but the size of the strips would
> increase. Would this effect performance somehow?

I hate the word performance because it doesn't have a meaning.

If you spread the load over more disks, IO will become less of a bottleneck,
regardless of how you configure it. Whether this means anything at all
depends on whether or not IO is currently your bottleneck or not.

In addition, larger capacity disks store more bits per cylinder, thus more
data pass the read/write head per revolution, thus the disks "perform"
differently, regardless of other factors such as raid levels and stripe

Other factors include: Bus speed to the disks, type of work-load (small
random reads, larges sequential writes, etc), whether you have enough free
CPU and Memory to drive the disks to their full capacity, etc.

Not knowing any of these things about your "general storage" data I would
hazard to say it will perform better than it is currently performing,
regardless of whether you add the new disks to the same or a new zpool,  and
if the existing pool, even if you use a different column size in the new
vdev, it will still be true most likely.

My suggestion:  Add the disks in a way that makes sense to you
holistically.  Think about what you want to achieve - consider everything,
like the required redundancy, performance, and required capacity.  Go by
your gut-feeling.

Exception to the rule:  If you have a serious performance problem and you
know the system is currently disk (IO) bound.  In this case, test it
properly:  Have base-line benchmarks and know what your testing objectives
are before you start.  Document your scenarios and run load tests on each
scenario, monitoring all resources so you know which is the bottleneck in
each case.

For most people the correct answer is simply add the disks to the existing
pool, using "sensible" raidz column sizes.  If this breaks the rule of "keep
the raidz column sizes the same", then so be it.

My understanding for the reason behind this rule has more to do with
ensuring that you (and your boss/customers) understand the amount of
protection your data has rather than with performance, but I may be wrong.

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Pros/Cons of multiple zpools?

2008-10-09 Thread Johan Hartzenberg
On Wed, Oct 8, 2008 at 9:29 PM, Joseph Mocker
> wrote:

> Hello,
> I haven't seen this discussed before. Any pointers would be appreciated.
> I'm curious, if I have a set of disks in a system, is there any benefit
> or disadvantage to breaking the disks into multiple pools instead of a
> single pool?
> Does multiple pools cause any additional overhead for ZFS, for example?
> Can it cause cache contention/starvation issues?
Hello Joseph.

Firstly, a separate pool for the OS is recommended.  The pool from which you
boot must be either Mirrored or else a single disk.  Booting from Stripes /
RaidZ is not supported.  Thus if you want to use a stripe or RaidZ you
pretty much MUST have a dedicated pool for that.

Secondly, if you use whole disks in your pools, it becomes possible to
physically "remove" a pool (using zpool export), eg to move a pool to
another system.

Further, it is recommended to use the same level of redundancy in all
vdev's.  Eg all vdevs should be mirrored, or the same nr of columns in the
stripe or raidz.  This is not a restriction, just a strong recommendation.

Never ever add multiple slices (partitions) from a single disk device to the
same pool - this will cause performance to go down to a crawl!

You can not (yet) "break up" a pool, though you can break off a mirror
copy.  And to stay in line with the above recommendations, you may want more
than one pool.  For best performance you should use whole-disks in pools,
but sometimes for practical reasons you may want to spit a single disk up in
slices and add those to separate pools.

Hope that helps!

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Comments on green-bytes

2008-10-07 Thread Johan Hartzenberg
Some people wrote:

> > covered code.   Since Sun owns that code they would need to rattle the
> > cage.  Sun? Anyone have any talks with these guys yet?
> Isn't CDDL file based so they could implement all the new functionality in
Wouldn't it be great if programmers could just focus on writing code rather
than having to worry about getting sued over whether someone else is able or
not to make a derivative program from their code?

Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke
zfs-discuss mailing list

Re: [zfs-discuss] [Fwd: Another ZFS question]

2008-09-27 Thread Johan Hartzenberg
On Sat, Sep 27, 2008 at 1:30 PM, jonathan sai <[EMAIL PROTECTED]> wrote:

> Would you mind helping me ask your tech guy whether there will be
> repercussions when I try to run this command in view of the situation below:
> #  *zpool add -f zhome raidz c6t6006016056AC1A00C8FB7A6346F8DB11d0
> c6t6006016056AC1A00D034FA5246F8DB11d0*

It can be done.  But with only two devices they should rather be mirrored
than made into a raidz.

Maybe wait for the scrub to continue and monitor the data errors before
zfs-discuss mailing list

Re: [zfs-discuss] zfs resilvering

2008-09-27 Thread Johan Hartzenberg
On Fri, Sep 26, 2008 at 7:03 PM, Richard Elling <[EMAIL PROTECTED]>wrote:

> Mikael Kjerrman wrote:
> > define a lot :-)
> >
> > We are doing about 7-8M per second which I don't think is a lot but
> perhaps it is enough to screw up the estimates? Anyhow the resilvering
> completed about 4386h earlier than expected so everything is ok now, but I
> still feel that the way it figures out the number is wrong.
> >
> Yes, the algorithm is conservative and very often wrong until you
> get close to the end.  In part this is because resilvering works in time
> order, not spatial distance. In ZFS, the oldest data is resilvered first.
> This is also why you will see a lot of "thinking" before you see a
> lot of I/O because ZFS is determining the order to resilver the data.
> Unfortunately, this makes time completion prediction somewhat
> difficult to get right.

Hi Richard,

Would it not make more sense then for the program to say something like "No
Estimate Yet" during the early part of the process, at least?

zfs-discuss mailing list

Re: [zfs-discuss] zfs resilvering

2008-09-26 Thread Johan Hartzenberg
On Fri, Sep 26, 2008 at 4:02 PM, <[EMAIL PROTECTED]> wrote:

> Note the progress so far "0.04%."  In my experience the time estimate has
> no basis in reality until it's about 1% do or so.  I think there is some
> bookkeeping or something ZFS does at the start of a scrub or resilver that
> throws off the time estimate for a while.  Thats just my experience with
> it but it's been like that pretty consistently for me.
> Jonathan Stewart

I agree here.

I've watched iostat -xnc 5 while I start scrubbing a few times, and the
first minute or so is spend doing very little IO.  There after the transfers
shoot up to near what I think is the maximum the drive can do an stays there
until the scrub is completed.
zfs-discuss mailing list

Re: [zfs-discuss] Slow zpool import with b98

2008-09-25 Thread Johan Hartzenberg
On Mon, Sep 22, 2008 at 3:59 PM, Detlef [EMAIL PROTECTED] <

> With Nevada Build 98 I realize a slow zpool import of my pool which
> holds my user and archive data on my laptop.
> The first time it was realized during the boot if Solaris tells me to
> mount zfs filesystems (1/9) and then works for 1-2 minutes until it goes
> ahead. I hear the disk working but have no clue what happens here.
> So I checked to zpool export and import, and with this import it is also
> slow (takes around 90 seconds to import and with b97 it took 5 seconds).
> Has anyone an idea what the reason could be ?

You don't by any chance have lots of USB flash storage devices or any blank
media in the CD / DVD drive?
zfs-discuss mailing list

Re: [zfs-discuss] [install-discuss] Will OpenSolaris and Nevada co-exist in peace on the same root zpool

2008-09-06 Thread Johan Hartzenberg
On Fri, Sep 5, 2008 at 2:06 PM, James Carlson <[EMAIL PROTECTED]>wrote:

> Johan Hartzenberg writes:
> > I am guessing the answer is YMMV depending on the differences in versions
> > of, for example Firefox, Gnome, Thunderbird, etc, and based on how well
> > these cope with settings that was changed by another potentially newer
> > version of itself.
> The answers to your questions are basically all "no."  The new
> installer wants a primary partition or a whole disk.
> However, there are helpful blogs from folks who've made the
> transition.  Poor Ed seems to have a broken 'shift' key, but he gives
> great details here:
> Hi James.

Thank you for the response.  I am going to try it the other way around then
- Install machine with OpenSolaris, then install NV into the pool as an
alternate boot environment.  I have a spare hard drive which I can use as a
sandpit environment.  Now I just need that other little thing called "time"
to experiment.

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:

ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk =
[EMAIL PROTECTED], AIM = JohanHartzenberg
zfs-discuss mailing list

[zfs-discuss] [install-discuss] Will OpenSolaris and Nevada co-exist in peace on the same root zpool

2008-09-05 Thread Johan Hartzenberg
Well, I want to give OpenSolaris a try, but have not yet worked up the
confidence to just try it.  So a few questions:

When I start the OpenSolaris installer, will it install into my existing
root zpool?
Which is called RPOOL. not rpool?
Without destroying my existing Nevada installations?
Or killing my existing Grub menu?
And will it be intelligent about my existing Live Upgrade BEs?
And other existing Shareable ZFS datasets (eg /export and /var/shared)

Related to this:
Can I have the same directory used for my home directory under both Nevada
and OpenSolaris?

I am guessing the answer is YMMV depending on the differences in versions
of, for example Firefox, Gnome, Thunderbird, etc, and based on how well
these cope with settings that was changed by another potentially newer
version of itself.


ZFS snapshots is your friend.  ZFS = LiveUpgrade: A match made in heaven.

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke
zfs-discuss mailing list

Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-08-31 Thread Johan Hartzenberg
On Thu, Aug 28, 2008 at 11:21 PM, Ian Collins <[EMAIL PROTECTED]> wrote:

> Miles Nordin writes:
> > suggested that unlike the SVM feature it should be automatic, because
> > by so being it becomes useful as an availability tool rather than just
> > performance optimisation.
> >
> So on a server with a read workload, how would you know if the remote
> volume
> was working?

Even reads induced writes (last access time, if nothing else)

My question: If a pool becomes non-redundant (eg due to a timeout, hotplug
removal, bad data returned from device, or for whatever reason), do we want
the affected pool/vdev/system to hang?  Generally speaking I would say that
this is what currently happens with other solutions.

Conversely:  Can the current situation be improved by allowing a device to
be taken out of the pool for writes - eg be placed in read-only mode?  I
would assume it is possible to modify the CoW system / functions which
allocates blocks for writes to ignore certain devices, at least

This would also lay a groundwork for allowing devices to be removed from a
pool - eg: Step 1: Make the device read-only. Step 2: touch every allocated
block on that device (causing it to be copied to some other disk), step 3:
remove it from the pool for reads as well and finally remove it from the
pool permanently.

zfs-discuss mailing list

Re: [zfs-discuss] Moving a ZFS root to another target

2008-08-19 Thread Johan Hartzenberg
You may also need to just boot to safe mode and manually import the root
pool to mount on /a, then reboot as this updates the device path stored in
the pool's on-disk meta-data.  If you search for my posts you will find
plenty discussions about my adventures with this.

On Tue, Aug 19, 2008 at 7:54 AM, Stephen Hahn <[EMAIL PROTECTED]> wrote:

> * andrew <[EMAIL PROTECTED]> [2008-08-16 00:38]:
> > Hmm... Just tried the same thing on SXCE build 95 and it works fine.
> > Strange. Anyone know what's up with OpenSolaris (the distro)? I'm
> > using the ISO of OpenSolaris 208.11 snv_93 image-updated to build 95
> > if that makes a difference. I've not tried this on 2008.05 .
>   For a while, the boot-archive on 2008.nn systems included a copy of
>  zpool.cache.  Recent versions do not make this mistake.  Delete and
>  regenerate your boot archive, and you should be able to make the
>  transfer.  See
>  and following.
>  - Stephen
> --
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

My blog:
zfs-discuss mailing list

[zfs-discuss] (no subject)

2008-08-10 Thread Johan Hartzenberg
I believe it would be handy to be able to examine properties of a ZFS pool
and all the data sets in it prior to importing the pool.  In particular I
would like to be able to do commands similar to "zfs list" and "zfs get ",
for example to see where file systems will be mounted, whether they will be
shared, eg whether new services will be started because I am importing a
pool, etc.

In addition I think a "temporary" import in read-only mode would be handy,
possibly to facilitate the above.  This temporary import should NOT update,
for example, the last host to which the pool belonged.

Any imput/suggestions before I open some RFEs?


Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] How to zpool add a logical partition

2008-08-10 Thread Johan Hartzenberg
On Sun, Aug 10, 2008 at 9:18 PM, Yi <[EMAIL PROTECTED]> wrote:

> Hi,
> I see docs talking about how to add a fdisk partition (or primary
> partition) to a zfs pool. But I wonder if it's possible to add a logical
> partition, which is inside the extended partition, to a pool. I'm on an X86
> system and these are in /dev/rdsk/:
> c4t0d0p[0-4]
> c4t0d0s[0-15]
> I don't know which represents which device.
> Thanks for any help!

Solaris does not support Logical partitions in an extended partition.
p0 is the whole disk
p1 through p4 are the primary fdisk partitions
s0 through s15 are the Slices in the Solaris partition

There are no devices referring to the logical partitions.

I have detailed the meaning of the Solaris disk device aliases in my blog.


Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] [install-discuss] lucreate into New ZFS pool

2008-08-10 Thread Johan Hartzenberg
My upgrade has been completed - Comments interleaved below.

On Fri, Aug 8, 2008 at 4:38 PM, Johan Hartzenberg <[EMAIL PROTECTED]>wrote:

> Hello,
> Since I've got my disk partitioning sorted out now, I want to move my BE
> from the old disk to the new disk.
> I created a new zpool, named RPOOL for distinction with the existing
> "rpool".
> I then did lucreate -p RPOOL -n new95
> This completed without error, the log is at the bottom of this mail.
> I have not yet dared to run luactivate. I also have not yet dared set the
> ACTIVE flag on any partitions on the new disk (I had some interesting times
> with that previously).  Before I complete these steps to set the active
> partition and run luactivate, I have a few questions:
> 1. I somehow doubt that the lucreate process installed a boot block on the
> new disk...  How can I confirm this?  Or is luactivate supposed to do this?

This was properly taken care of by luactivate.

> 2. There are a number of open issues still with ZFS root.  I saw some notes
> pertaining to leaving the first cylinder of the disk out from the root pool
> slice.  What is that all about?

I can't find the references to this.  I found this while reading up on ZFS
root mirroring, but can't find it again.  At any rate, wheatever the issue
was it seems to not affect me.

> 3. I have a remnant of the lucreate process in my mounts ... (which
> prevents, for example lumount and previously caused problems with
> luactivate)

I had to do the lucreate 3 times before it worked.  After the first time I
had the stuck mount points.  This caused some files from the zone to be
copied directly into /.alt.*, which caused the lumount and luactivate to
fail.  It took me two attempts to clean out everything manually because
ludelete also refuses to delete a BE which it can not mount.

> 4. I see the vdev for dump got created in the new pool, but not for swap?
> Is this to be expected?

On the second and third attempts lucreate did in fact create the SWAP vdev.

> 5. There were notes about errors which were recorded in /tmp/lucopy.errors
> ... I've rebooted my machine since, so I can't review those any more  I
> guess I need to run the lucreate again to see if it happens again and to be
> able to read those logs before they get lost again.

These did not recur.  Note however that between the second and third
attempts I removed the zone, so I performed the "upgrade" without any zones

> 6. Since SHARED is an entirely independent pool, and since the purpose of
> this lucreate is to move root from one disk to another, I don't see why
> lucreate needed to make snapshots of the zone!

And this became a non-issue as I completed the zone with no zones

> 7. Despite the messages that the grub menu have been distributed and
> populated successfully, the new boot environment have not been added to the
> grub menu list.  My experience though is that this happens during
> luactivate, so I'm not concerned about this just yet.

This also became a non-issue on subsequent runs.

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] Shared ZFS in Multi-boot?

2008-08-09 Thread Johan Hartzenberg
On Thu, Aug 7, 2008 at 6:44 PM, Bob Netherton <[EMAIL PROTECTED]>wrote:

> On Thu, 2008-08-07 at 09:16 -0700, Daniel Templeton wrote:
> > Is there a way that I can add the disk to a ZFS pool and have
> > the ZFS pool accessible to all of the OS instances?  I poked through the
> > docs and searched around a bit, but I couldn't find anything on the
> topic.
> Yes.  I do that all of the time.   The trick here is to create the pool
> and filesystems with the oldest Solaris you will use.  ZFS has very
> good backward compatibility but not the reverse.
> Here's a trick that will come in handy.  Create quite a few empty
> ZFS filesystems in your oldest Solaris.  In my case the pool is
> called throatwarbler and I have misc1 misc2 misc3 misc4 misc5 .
> What happens is that I will be running a newer Solaris and want a
> filesystem.  Rather than reboot to the older Solaris, just rename
> misc[n] to the new name.

Bob, is there any specific reason why you suggest the creation of a bunch of
zfs datasets up front?

I have found that once you have created the ZFS pool on the oldest Solaris,
you should be able to create zfs datasets within it from any supported OS
(including Linux with zfs-fuse)

My experience with this: I had to create the zpool with zfs-fuse under
Linux, and after that I was able to manipulate, import, and export the pool
from on all releases.

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

[zfs-discuss] [install-discuss] lucreate into New ZFS pool

2008-08-08 Thread Johan Hartzenberg

Since I've got my disk partitioning sorted out now, I want to move my BE
from the old disk to the new disk.

I created a new zpool, named RPOOL for distinction with the existing
I then did lucreate -p RPOOL -n new95

This completed without error, the log is at the bottom of this mail.

I have not yet dared to run luactivate. I also have not yet dared set the
ACTIVE flag on any partitions on the new disk (I had some interesting times
with that previously).  Before I complete these steps to set the active
partition and run luactivate, I have a few questions:

1. I somehow doubt that the lucreate process installed a boot block on the
new disk...  How can I confirm this?  Or is luactivate supposed to do this?
2. There are a number of open issues still with ZFS root.  I saw some notes
pertaining to leaving the first cylinder of the disk out from the root pool
slice.  What is that all about?
3. I have a remnant of the lucreate process in my mounts ... (which
prevents, for example lumount and previously caused problems with
4. I see the vdev for dump got created in the new pool, but not for swap?
Is this to be expected?
5. There were notes about errors which were recorded in /tmp/lucopy.errors
... I've rebooted my machine since, so I can't review those any more  I
guess I need to run the lucreate again to see if it happens again and to be
able to read those logs before they get lost again.
6. Since SHARED is an entirely independent pool, and since the purpose of
this lucreate is to move root from one disk to another, I don't see why
lucreate needed to make snapshots of the zone!
7. Despite the messages that the grub menu have been distributed and
populated successfully, the new boot environment have not been added to the
grub menu list.  My experience though is that this happens during
luactivate, so I'm not concerned about this just yet.

Below is some bits showing the current status of the system:

$ zfs list -r RPOOL
RPOOL 7.97G  24.0G  26.5K  /RPOOL
RPOOL/ROOT/new95  6.47G  24.0G  6.47G  /.alt.new95
RPOOL/dump1.50G  25.5G16K  -
/RPOOL/boot/grub $
/RPOOL/boot/grub $
/RPOOL/boot/grub $ lustatus
Boot Environment   Is   Active ActiveCanCopy
Name   Complete NowOn Reboot Delete Status
--  -- - -- --
snv_94 yes  no noyes-
snv_95 yes  yesyes   no -
new95  yes  no noyes-
/RPOOL/boot/grub $ luumount new95
ERROR: boot environment  is not mounted

$ zfs list -r RPOOL
RPOOL 7.97G  24.0G  26.5K  /RPOOL
RPOOL/ROOT/new95  6.47G  24.0G  6.47G  /.alt.new95
RPOOL/dump1.50G  25.5G16K  -

$ lustatus
Boot Environment   Is   Active ActiveCanCopy
Name   Complete NowOn Reboot Delete Status
--  -- - -- --
snv_94 yes  no noyes-
snv_95 yes  yesyes   no -
new95  yes  no noyes-

Thank you,

For what it is worth, below is the log of the lucreate session.
/dev/dsk $ zpool create -f RPOOL c0d0s0
/dev/dsk $ timex lucreate -p RPOOL -n new95
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment  file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Creating file systems on boot environment .
Creating  file system for  in zone  on .
Populating file systems on boot environment .
Checking selection integrity.
Integrity check OK.
Populating contents of mount point .
WARNING: The file  contains a list of <2>
potential problems (issues) that were encountered while populating boot
environment .
INFORMATION: You must review the issues listed in
 and determine if any must be resolved. In
general, you can ignore warnings about files that were skipped because
they did not exist or could not be opened. You cannot ignore errors such
as directories or files that could not be created, or file systems running
out of disk space. You must manually resolve any such problems before you
activate boot environment .
Creating shared file system mount points.
Creating snapshot for  on .
Creating clone for  on .
Creating compare databases for boot environment .
Creating compare datab

[zfs-discuss] ZFS and disk partitioning

2008-08-05 Thread Johan Hartzenberg
I am trying to upgrade my laptop hard drive, and want to use Live-upgrade.

What I have done so far is:
1. Moved the old drive to an external enclosure

2. Made it bootable (At this point I had to overcome the first obstacle -
due to ZFS storing the disk device path in the ZFS structure it refused to
automatically mount the root file system.  The work-around involved booting
to safe mode and mounting the zfs file systems, then rebooting.  Note
previously I had to re-do this even when moving the disk from one USB port
to another.  The disk is now portable at least between USB ports, seemingly
after zpool upgrade to v11)

3. Installed the new drive into the laptop.

4. Partitioned it using Solaris/fdisk.  Oops.

At this point I had to overcome the second obstacle - the system failed to
find the root pool.  The eventual solution (work arround) was to boot from a
live CD and wipe the partition table from the internal disk.

5. Trying to create a partition table on the disk again resulted in format
telling me the disk type is unknown.  A partial work-arround was to
temporarily put the whole disk under zfs control and then destroying the
pool.  This resulted in an EFI label being created on the disk.  From here
it was possible to delete the EFI partition and create new partitions, but
Solaris does not properly recognize the primary partitions created.

The desired outcome of the partitioning is:

fdisk P1 = Solaris2 (oxbf) to be used as ZFS root
fdisk P2 =  (to be used as ZFS data pool)
fdisk P3 = NTFS... Still debating whether I want to have a copy of Windows
consuming disk space... I still have to finish Far Cry some time.
fdisk P4 = Extended partition, will be sub-partitioned for Linux.

The best I've been able to do so far is to use Linux to create P1 and P2
above with neither made active.  If either is made active, I can no longer
boot from the external disk (grub fails to find the root).

But Linux did not properly create the partition table.

   0. c0d0 
   1. c2t0d0 
Specify disk (enter its number)[0]:
selecting c0d0
NO Alt slice
No defect list found
[disk formatted, no defect list found]

Entering the FDISK menu, I see
 Total disk size is 30401 cylinders
 Cylinder size is 16065 (512 byte) blocks

  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 Solaris2  0  42564257 14
  2 EFI4256  2614021885 72

   1. Create a partition
   2. Specify the active partition
   3. Delete a partition
   4. Change between Solaris and Solaris2 Partition IDs
   5. Exit (update disk configuration and exit)
   6. Cancel (exit without updating disk configuration)
Enter Selection:

Going to the partition menu, I try to create a Slice 0 of the entire disk:
partition> mod
Select partitioning base:
0. Current partition table (original)
1. All Free Hog
Choose base (enter number) [0]? 1

Part  TagFlag First Sector Size Last Sector
  0 unassignedwm 0   0   0
  1 unassignedwm 0   0   0
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 0   0   0

Do you wish to continue creating a new partition
table based on above table[yes]? 0
`0' is not expected.
Do you wish to continue creating a new partition
table based on above table[yes]? yes
Free Hog partition[6]? 0
Enter size of partition 1 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 2 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 3 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 4 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 5 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 6 [0b, 33e, 0mb, 0gb, 0tb]: 0
Enter size of partition 7 [0b, 33e, 0mb, 0gb, 0tb]: 0
Part  TagFlag First Sector Size Last Sector
  0usrwm34  232.88GB  488379741
  1 unassignedwm 0   0   0
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   

Re: [zfs-discuss] are these errors dangerous

2008-08-03 Thread Johan Hartzenberg
On Sun, Aug 3, 2008 at 8:48 PM, Matt Harrison

> Miles Nordin wrote:
> >> "mh" == Matt Harrison <[EMAIL PROTECTED]> writes:
> >
> > mh>  I'm worried about is if the entire batch is failing slowly
> > mh> and will all die at the same time.
> >

Matt, can you please post the output from this command:

iostat -E

This will show counts of the types of errors for all disks since the last
reboot.  I am guessing sd0 is your CD / DVD drive.

Thank you,

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] ZFS vs FAT

2008-08-03 Thread Johan Hartzenberg
On Sun, Aug 3, 2008 at 9:31 AM, Rahul <[EMAIL PROTECTED]> wrote:

> Can u site the differences b/w ZFS and FAT filesystems??

Assuming you are serious, the technical bits can be found here:

But there is a bigger, fundamental difference between ZFS and all other file

Firstly, ZFS does away with traditional disk partitioning and space
allocation principles.  Many other volume managers claims to use storage
pools in some form or another, but ZFS truly realizes this.

To this effect, ZFS integrates volume management features to the POSIX
layer.  Basically, when a read from an application fails, the kernel is
aware of the underlying bits which might save the day.  In short, in stead
of panic-ing because data is corrupted, it can possibly re-try the operation
from a different disk, or even from a second copy on the same disk, etc.
What is more, it will FIX the problem there and then, in the background.

Example: Mirrored disks.  One side of the mirror somehow fails the checksum
on the data.  ZFS reads from the other mirror, and returns good data to the
application.  But it goes further in that it fixes the bad data on the otehr
mirror copy.

Secondly, ZFS incorporates an amazing set of features:  Online snapshots,
encryption, reservations, quotas, compression, turning on and off these and
several other features ONLINE.

Third, ZFS administration is easy.  No need to modify files to set
mountpoins, share file systems, etc.  The ZFS utilities will even turn on
the required services for you when you share a file system via SMB or NFS.

Lastly, ZFS's big claim to fame: Never get a corrupted file system.  All
operations are transactionally completed when they are comitted.  This is
done by means of three things: Copy-on-write for all changes, a
tree-structure to the underlying data and meta-data and space allocation,
and the "ZIL" - Eg the ZFS Intent Log.  Going into these in depth are things
you can read on in many posts on

Hope this helps,
zfs-discuss mailing list

Re: [zfs-discuss] Diagnosing problems after botched upgrade - grub busted

2008-08-01 Thread Johan Hartzenberg
On Fri, Aug 1, 2008 at 11:43 PM, Johan Hartzenberg <[EMAIL PROTECTED]>wrote:

> [snip]
> I could now just re-install and recover my data (I keep my data far away
> from OS disks/pools), or I can try to fix grub.  I hope to learn from this
> process so my questions are:
> 1. What is up with grub here?  I don't get a menu, but it does remember the
> old menu entry name for the default entry.  This happens even when I try to
> boot without the External drive plugged in.
> 2. How can I edit the grub commands?  What does "Error 15: File not found"
> mean?  Is it looking for the grub menu?  Or a program to boot?
> 3. Removing the internal disk from the machine may help... I am not sure to
> what extent grub uses the BIOS boot disk priority... Maybe that will get the
> external disk bootable again?
> 4. Should I try to get the grub menu back (from where I can try options to
> edit the boot entries), or should I try to get the grub> prompt back?  Or
> should I try to get one of the pools to import?  Where do I go from here?
> Note: I have been careful not to touch or break anything on the external
> disk.  However I never tried to reboot since partitioning the new disk with
> an ACTIVE partition, the way it is at present.  I think this could also
> affect grub's perception of what disks are what.
> Thank you,
>   _Johan

I physically removed the internal disk.  I am now able to boot again, at
least temporarily.
zfs-discuss mailing list

Re: [zfs-discuss] 200805 Grub problems

2008-08-01 Thread Johan Hartzenberg
Hello kugutsumen, Did you have any luck in resolving your problems?

On Sun, Jun 8, 2008 at 10:53 AM, Kugutsumen <[EMAIL PROTECTED]>wrote:

> I've just installed 2008.05 on a 500 gig disk... Install went fine...
> I attached an identically partitioned and labeled disk as soon as the
> rpool was created during the installation.:
>  zpool attach rpool c5t0d0s0 c6t0d0s0
> Resilver completed right away... and everything seemed to work fine.
> Boot on 1st disk and 2nd disk both worked fine...
> I created a zfs filesystem, enabled samba sharing which worked fine:
> pkg install SUNWsmbs
> pkg install SUNWsmbskr
> svcadm enable -r smb/server
> echo >>/etc/pam.conf other password required nowarn
> zfs create -o casesensitivity=mixed -o nbmand=on -o sharesmb=on rpool/p
> zfs set sharesmb=name=p rpool/p
> I copied a bunch of stuff to /rpool/p
> rebooted and problem started:
> Grub drops me to the command prompt without menu...
> Trying bootfs rpool/ROOT/opensolaris
> kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
> failed with an inconsistent file system structure...
> Rebooted into install environment and did a 'zpool import -R /mnt -f rpool'
> ... rpool seems
> to be okay and rebooted.
> Grub drops me again to the command prompt without menu...
> Trying bootfs rpool/ROOT/opensolaris
> kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
> fails with Error 17: Cannot mount selected partition
> Rebooted with the install CD in text mode... and tried
>zpool import -R /mnt -f rpool
>mkdir /mnt2
>mount -F zfs rpool/ROOT/opensolaris /mnt2
>bootadm update-archive -R /mnt2
>zpool set bootfs=rpool/ROOT/opensolaris rpool
>installgrub /mnt/boot/grub/stage1 /mnt/boot/grub/stage2
> /dev/rdsk/c5t0d0s0
>installgrub /mnt/boot/grub/stage1 /mnt/boot/grub/stage2
> /dev/rdsk/c6t0d0s0
> What am I doing wrong?
> This message posted from
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:

ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk =
[EMAIL PROTECTED], AIM = JohanHartzenberg
zfs-discuss mailing list

[zfs-discuss] Diagnosing problems after botched upgrade - grub busted

2008-08-01 Thread Johan Hartzenberg
I tried to be clever and botched my upgrade.  Now I don't get a grub menu,
only an error like this:

Booting 'BE3 Solaris xVM'

findroot (BE_BE3,1,a)

Error 15: File not found

Press any key to continue

I do not see a grub menu prior to this error, only the Stage1 Stage2 message
which goes past very fast.

Prior to this error I booted from a CD to single-user mode and ran
installgrub stage1 stage2 /dev/rdsk/Xs0

I did this because at that point grub just gave me a grub prompt and I don't
know grub well enough to boot from there.  I rather suspect that if I manage
to boot the system there will be a way to fix it permanently.  But now
rather let me give the sequence of events that led up to this in the order
they happened.

1.  I took the disk out of the laptop, and made it bootable in an external
enclosure.  This was a couple of days ago - I posted about the fun I had
with that previously, but essentially booting to safemode and importing the
rpool caused the on-disk device-path to be updated, making the disk once
more bootable.

2. I partitioned the new disk, creating a solaris2 partition and on that a
single hog-slice layout.  s0 is the whole partition, minus slice 8 and 9.

3. I create a new future root pool, like this
zpool create RPOOL -f c0d0s0

Note:  -f required because s2 overlaps.

4. Ran lucreate, like this
lucreate -p RPOOL -n BE4

This finished fine.  I used upper-case RPOOL to distinguish it from the BE3

5. mounted new Nevada build ISO on /mnt and ran upgraded the live-upgrade

6. luupgrade -s /mnt -n BE4

7. lumount BE4 and peeked around in there a little.

After this I rebooted, and got no grub menu, just a grub> prompt.

I then booted from the CD and ran installgrub.  Not being able to get to man
pages, I have tried it two times with different options, with reboots in
between, like this:
> installgrub zfs_stage1_5 stage2 /dev/rds/s0
> installgrub -m stage1 stage2 /dev/rdsk/xxs2

This at least got me the error above (Am I now worse off or better off than
I were when I had the grub> prmpt?).

I then booted from the CD again and tried /boot/solaris/bin/update_grub as I
found that in these forums, but it does not seem to have made any
difference.  I don't know if the command takes any options, I just ran it
and it finished very quickly and without errors.

Note: Due to past editing of the menu.lst file, the default item points to
the BE3 xVM entry.  I just tap the up-arrow and enter to load the "non-xVM"

Note: I never ran luactivate during the above procedure.

Note: When booting to single-user shell from the install CD, it tells me
that it finds both rpool (BE3) and RPOOL (BE4), allowing me to select one to
mount on /a, however they do not mount, I get an error but I forgot to write
that down.  I get the same error for both.

I could now just re-install and recover my data (I keep my data far away
from OS disks/pools), or I can try to fix grub.  I hope to learn from this
process so my questions are:

1. What is up with grub here?  I don't get a menu, but it does remember the
old menu entry name for the default entry.  This happens even when I try to
boot without the External drive plugged in.

2. How can I edit the grub commands?  What does "Error 15: File not found"
mean?  Is it looking for the grub menu?  Or a program to boot?

3. Removing the internal disk from the machine may help... I am not sure to
what extent grub uses the BIOS boot disk priority... Maybe that will get the
external disk bootable again?

4. Should I try to get the grub menu back (from where I can try options to
edit the boot entries), or should I try to get the grub> prompt back?  Or
should I try to get one of the pools to import?  Where do I go from here?

Note: I have been careful not to touch or break anything on the external
disk.  However I never tried to reboot since partitioning the new disk with
an ACTIVE partition, the way it is at present.  I think this could also
affect grub's perception of what disks are what.

Thank you,
zfs-discuss mailing list

Re: [zfs-discuss] ZFS Mirroring - Scenario

2008-07-11 Thread Johan Hartzenberg
Sorry, but I'm stuck at 6540.

There are so many options in how you would practically configure these that
there is no way to give a sensible answer to your question.  But the most
basic questions are: Does the racks have power from separate PDUs?  Are they
in physically remote locations?  Does your fabric switches have redundant
power from separate PDUs?

Do you want mirroring here purely for performance reasons?  Because these
systems have so much internal redundancy that I can not see why you would
want to mirror across them.

Striping would give you better performance.

On Thu, Jul 10, 2008 at 11:01 PM, Robb Snavely <[EMAIL PROTECTED]> wrote:

> I have a scenario (tray failure) that I am trying to "predict" how zfs
> will behave and am looking for some input .  Coming from the world of
> svm, ZFS is WAY different ;)
> If we have 2 racks, containing 4 trays each, 2 6540's that present 8D
> Raid5 luns to the OS/zfs and through zfs we setup a mirror config such
> that: I'm oversimplifying here but...
> Rack 1 - Tray 1 = lun 0Rack 2 - Tray 1  =  lun 4
> Rack 1 - Tray 2 = lun 1Rack 2 - Tray 2  =  lun 5
> Rack 1 - Tray 3 = lun 2Rack 2 - Tray 3  =  lun 6
> Rack 1 - Tray 4 = lun 3Rack 2 - Tray 4  =  lun 7
> so the zpool command would be:
> zpool create somepool mirror 0 4 mirror 1 5 mirror 2 6 mirror 3 7
> <---(just for ease of explanation using the "supposed" lun numbers)
> so a status output would look similar to:
> somepool
>  mirror
>  0
>  4
>  mirror
>  1
>  5
>  mirror
>  3
>  6
>  mirror
>  4
>  7
> Now in the VERY unlikely event that we lost the first tray in each rack
> which contain 0 and 4 respectively...
> somepool
>  mirror---
>  0   |
>  4   |   Bye Bye
> ---
>  mirror
>  1
>  5
>  mirror
>  3
>  6
>  mirror
>  4
>  7
> Would the entire "somepool" zpool die?  Would it affect ALL users in
> this pool or a portion of the users?  Is there a way in zfs to be able
> to tell what individual users are hosed (my group is a bunch of control
> freaks ;)?  How would zfs react to something like this?  Also any
> feedback on a better way to do this is more then welcome
> Please keep in mind I am a "ZFS noob" so detailed explanations would be
> awesome.
> Thanks in advance
> Robb
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:

ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk =
[EMAIL PROTECTED], AIM = JohanHartzenberg
zfs-discuss mailing list

Re: [zfs-discuss] raid or mirror

2008-07-11 Thread Johan Hartzenberg
Hi Dick

You want Mirroring.  A Sun system with mirrored disks can be configured to
not go down due to one disk failing.  For this to be valid, you need to also
make sure that the device used for SWAP is mirrored - you won't believe how
many times I've seen this mistake being made.

To be even MORE safe, you want the two disks to be on separate controllers,
so that you can survive a controller failure too.

note: Technically, mirroring is RAID, to be specific, it is Raid level 1.


On Fri, Jul 11, 2008 at 2:37 PM, dick hoogendijk <[EMAIL PROTECTED]> wrote:

> I'm still confused.
> What is a -SAFE- way with two drives if you prepare for hardware
> faulure? That is: one drive fails and the system does not go down
> because the other drive takes over. Do I need raid or mirror?
> --
> Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
> ++ + SunOS sxce snv91 ++
> ___
> zfs-discuss mailing list

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:

ICQ = 193944626, YahooIM = johan_hartzenberg, GoogleTalk =
[EMAIL PROTECTED], AIM = JohanHartzenberg
zfs-discuss mailing list

Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correc

2008-07-06 Thread Johan Hartzenberg
On Sun, Jul 6, 2008 at 3:46 PM, Ross <[EMAIL PROTECTED]> wrote:

> For your second one I'm less sure what's going on:
> # zpool create temparray raidz c1t2d0 c1t4d0 raidz c1t3d0 c1t5d0 raidz
> c1t6d0 c1t8d0
> This creates three two disk raid-z sets and stripes the data across them.
>  The problem is that a two disk raid-z makes no sense.  Traditionally this
> level of raid needs a minimum of three disks to work.  I suspect ZFS may be
> interpreting raid-z as requiring one parity drive, in which case this will
> effectively mirror the drives, but without the read performance boost that
> mirroring would give you.
> The way zpool create works is that you can specify raid or mirror sets, but
> that if you list a bunch of these one after the other, it simply strips the
> data across them.
> I read somewhere, a long time ago when ZFS documentation were still mostly
speculation, that raidz will use "mirroring" when the amount of data to be
written is less than what justifies 2+parity.  Eg in stead of 1+parity, you
get mirrored data for small writes, and essentially raid-5 for big writes,
with writes with intermediate sizes having raid 5 - like spread of blocks
across disks but using fewer than the total nr of disks in the set.

If that still holds true, then a raidz of 2 disks is probably just a mirror?
zfs-discuss mailing list

Re: [zfs-discuss] is it possible to add a mirror device later?

2008-07-06 Thread Johan Hartzenberg
On Sun, Jul 6, 2008 at 10:27 AM, Jeff Bonwick <[EMAIL PROTECTED]> wrote:

> I would just swap the physical locations of the drives, so that the
> second half of the mirror is in the right location to be bootable.
> ZFS won't mind -- it tracks the disks by content, not by pathname.
> Note that SATA is not hotplug-happy, so you're probably best off
> doing this while the box is powered off.  Upon reboot, ZFS should
> figure out what happened, update the device paths, and... that's it.

Wishlist item nr 1: Ability to setup raid 1+z
Wishlist item nr 2: Remove disks from pools


Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

Afrikaanse Stap Website:

My blog:
zfs-discuss mailing list

Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Johan Hartzenberg
On Sat, Jul 5, 2008 at 9:34 PM, Robert Lawhead <

> About a month ago (Jun 2008), I received information indicating that a
> putback fixing this problem was in the works and might appear as soon as
> b92.  Apparently this estimate was overly optimistic; Does anyone know
> anything about progress on this issue or have a revised estimate for the
> putback?
> Thanks.
> This page:

Says the putback will be in SNV 94
zfs-discuss mailing list

Re: [zfs-discuss] Some basic questions about getting the best performance for database usage

2008-07-01 Thread Johan Hartzenberg
On Mon, Jun 30, 2008 at 10:17 AM, Christiaan Willemsen <

> The question is: how can we maximize IO by using the best possible
> combination of hardware and ZFS RAID?
> Here are some generic concepts that still hold true:

More disks can handle more IOs.

Larger disks can put more data on the outer edge, where performance is

If you use disks much larger than your required data set, then the head seek
movement will also be minimized (You can limit the seek more by forcing the
file system to live in a small slice on the disk, the placement on the disk
which you can control.)

Don't put all your disks on a single controller.  Just as more disks can
handle more IOs at a time, so can more controllers issue more instructions
at once.  On the other hand giving each disk a dedicated controller is a
waste because the controller will then be idle most of the time, waiting for
the disk to return results.

RAM, as mentioned before, is your friend.  ZFS will use it liberally.

You mentioned a 70 GB database, so: If you take say 10 x 146GB 15Krpm SAS
disks, set those up in a 4-disk stripe and add a mirror to each disk, you'll
get pretty decent performance.  I read somewhere that ZFS automatically
gives preferences to the outer cylinders of a disk when selecting free
blocks, but you could also restrict the ZFS pool to using only the outer say
20 GB of each disk by creating slices and adding those to the pool.

Note if you do use slices in stead of whole disks, you need to manually turn
on disk write caching (format -e -> SCSI cache options)

If you don't care about tracking file access times, turn it off. (zfs set
atime=off datapool)

Have you decided on a server model yet?  Storage subsystems?  HBAs?  The
specifics in your configuration will undoubtedly get lots of responses from
this list about how to tune each component!  Everything from memory
interleaving to spreading your HBAs across schizo chips.

However much more important in your actual end result is your application
and DB setup, config, and how it is developed.  If the application
developers or the DBAs get it wrong, the system will always be a dog.
zfs-discuss mailing list

Re: [zfs-discuss] ZFS Performance Issue

2008-02-10 Thread Johan Hartzenberg
On Feb 5, 2008 9:52 PM, William Fretts-Saxton <[EMAIL PROTECTED]>

> This may not be a ZFS issue, so please bear with me!
> I have 4 internal drives that I have striped/mirrored with ZFS and have an
> application server which is reading/writing to hundreds of thousands of
> files on it, thousands of files @ a time.
> If 1 client uses the app server, the transaction (reading/writing to ~80
> files) takes about 200 ms.  If I have about 80 clients attempting it @ once,
> it can sometimes take a minute or more.  I'm pretty sure its a file I/O
> bottleneck so I want to make sure ZFS is tuned properly for this kind of
> usage.
> The only thing I could think of, so far, is to turn off ZFS compression.
>  Is there anything else I can do?  Here is my "zpool iostat" output:
Hi William

To improve performance, consider turning off atime, assuming you don't need

# zfs set atime=off POOL/filesystem

zfs-discuss mailing list

Re: [zfs-discuss] OpenSolaris, ZFS and Hardware RAID, a recipe for success?

2008-02-10 Thread Johan Hartzenberg
On Feb 10, 2008 9:06 AM, Jonathan Loran <[EMAIL PROTECTED]> wrote:

> Richard Elling wrote:
> Nick wrote:
>   Using the RAID cards capability for RAID6 sounds attractive?
>  Assuming the card works well with Solaris, this sounds like a
> reasonable solution.
>  Careful here.  If your workload is unpredictable, RAID 6 (and RAID 5) for
> that matter will break down under highly randomized write loads.  There's a
> lot of trickery done with hardware RAID cards that can do some read-ahead
> caching magic, improving the read-paritycalc-paritycalc-write cycle, but you
> can't beat out the laws of physics.  If you do *know* you'll be streaming
> more than writing random small number of blocks, RAID 6 hardware can work.
> But with transaction like loads, performance will suck.
> Jon

I would like to echo Jon's sentiments and add the following:  If you are
going to have a mix of workload types or if your IO pattern is unknown, then
I would suggest that you configure the array as a JBOD and use raidz.  Raid
5 or Raid 6 works best for predictable IOs with well controlled IO unit

How you lay it out depends on whether you need (or want) hot spares.  What
are your objectives here?  Maximum throughput, lowest latencies, maximum
space, best redundancy, serviceability/portability, or  ?

zfs-discuss mailing list