Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Jeff Bonwick
> There is no substitute for cord-yank tests - many and often. The  
> weird part is, the ZFS design team simulated millions of them.
> So the full explanation remains to be uncovered?

We simulated power failure; we did not simulate disks that simply
blow off write ordering.  Any disk that you'd ever deploy in an
enterprise or storage appliance context gets this right.

The good news is that ZFS is getting popular enough on consumer-grade
hardware.  The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Toby Thain

On 9-Feb-09, at 6:17 PM, Miles Nordin wrote:

>> "ok" == Orvar Korvar  writes:
>
> ok> You are not using ZFS correctly.
> ok> You have misunderstood how it is used. If you dont follow the
> ok> manual (which you havent) then any filesystem will cause
> ok> problems and corruption, even ZFS or ntfs or FAT32, etc. You
> ok> must use ZFS correctly. Start by reading the manual.
>
> Before writing a reply dripping with condescention, why don't you
> start by reading the part of the ``manual'' where it says ``always
> consistent on disk''?
>
> Please, lay off the kool-aid, or else drink more of it: Unclean
> dismounts are *SUPPORTED*.  This is a great supposed ZFS feature BUT
> cord-yanking is not supposed to cause loss of the entire filesystem,
> not on _any_ modern filesystem such as: UFS, FFS, ext3, xfs, hfs+.

> ... the write barrier problem is pervasive.  Linux LVM2
> throws them away, and many OS's that _do_ implement fdatasync() for
> the userland including Linux-without-LVM2 only sync part way down,
> don't propogate it all the way down the storage stack to the drive, so
> file-backed pools (as you might use for testing, backup, or virtual
> guests) are not completely safe.
>
> Aside from these examples, note that, AIUI, Sun's sun4v I/O
> virtualizer, VirtualBox software, and iSCSI initiator and target were
> all caught guilty of this write barrier problem, too,

YES! I recently discovered that VirtualBox apparently defaults to  
ignoring flushes, which would, if true, introduce a failure mode  
generally absent from real hardware (and eventually resulting in  
consistency problems quite unexpected to the user who carefully  
configured her journaled filesystem or transactional RDBMS!)

It seems as though I'll have to dive into the source code to prove  
it, though:
http://forums.virtualbox.org/viewtopic.php?p=59123#59123

There is no substitute for cord-yank tests - many and often. The  
weird part is, the ZFS design team simulated millions of them. So the  
full explanation remains to be uncovered?

--Toby


> so it's not
> only, or even mostly, a consumer-grade problem or an other-tent
> problem.
> ...
> The ubifs camp did an end-to-end test for their filesystem's integrity
> using a networked power strip to do automated cord-yanking.  I think
> ZFS needs an easier, faster test though, something everyone can do
> before loading data into a pool.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two pools on one slice

2009-02-09 Thread Scott Watanabe
Have tried the procedure in the ZFS TS guide?
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Panic.2FReboot.2FPool_Import_Problems
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS corruption

2009-02-09 Thread Richard Elling
Leonid Roodnitsky wrote:
> Dear All,
>
> I am receiving DEGRAGED for zpool status -v. 3 out of 14 disks are reported 
> as degraded with 'too many errors'. This is Build 99 running on x4240 with 
> STK SAS RAID controller. Version of AAC driver is 2.2.5. I am not sure even 
> where to start. Any advice is very much appreciated. Trying to convince 
> management that ZFS is the way to go and then getting this problem. RAID 
> controller does not report any problems with drives. This is RAIDZ (RAID5) 
> zpool. Thank you everybody.
>
>   

The zpool man page says:
 The health of the top-level vdev, such as  mirror  or  raidz
 device,  is potentially impacted by the state of its associ-
 ated vdevs, or component devices. A top-level vdev  or  com-
 ponent device is in one of the following states:

 DEGRADEDOne or more top-level vdevs is in  the  degraded
 state  because one or more component devices are
 offline. Sufficient replicas exist  to  continue
 functioning.

 One or more component devices is in the degraded
 or  faulted state, but sufficient replicas exist
 to continue functioning. The  underlying  condi-
 tions are as follows:

 oThe number of checksum  errors  exceeds
  acceptable  levels  and  the  device is
  degraded as an  indication  that  some-
  thing  may  be  wrong. ZFS continues to
  use the device as necessary.

 oThe  number  of  I/O   errors   exceeds
  acceptable levels. The device could not
  be marked as faulted because there  are
  insufficient replicas to continue func-
  tioning.

You should take this into consideration as you decide whether
to replace disks or not.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Miles Nordin
> "ok" == Orvar Korvar  writes:

ok> You are not using ZFS correctly. 
ok> You have misunderstood how it is used. If you dont follow the
ok> manual (which you havent) then any filesystem will cause
ok> problems and corruption, even ZFS or ntfs or FAT32, etc. You
ok> must use ZFS correctly. Start by reading the manual.

Before writing a reply dripping with condescention, why don't you
start by reading the part of the ``manual'' where it says ``always
consistent on disk''?

Please, lay off the kool-aid, or else drink more of it: Unclean
dismounts are *SUPPORTED*.  This is a great supposed ZFS feature BUT
cord-yanking is not supposed to cause loss of the entire filesystem,
not on _any_ modern filesystem such as: UFS, FFS, ext3, xfs, hfs+.

There is a real problem here.  Maybe not all of the problem is in ZFS,
but some of it is.  If ZFS is going to be vastly more sensitive to
discarded SYNCHRONIZE CACHE commands than competing filesystems to the
point that it trashes entire pools on an unclean dismount, then it
will have to include a storage stack qualification tool, not just a
row of defensive pundits ready to point their fingers at hard drives
which are guilty until proven innocent, and lack an innocence-proving
tool.  And I'm not convinced that's the only problem.

Even if it is, the write barrier problem is pervasive.  Linux LVM2
throws them away, and many OS's that _do_ implement fdatasync() for
the userland including Linux-without-LVM2 only sync part way down,
don't propogate it all the way down the storage stack to the drive, so
file-backed pools (as you might use for testing, backup, or virtual
guests) are not completely safe.  

Aside from these examples, note that, AIUI, Sun's sun4v I/O
virtualizer, VirtualBox software, and iSCSI initiator and target were
all caught guilty of this write barrier problem, too, so it's not
only, or even mostly, a consumer-grade problem or an other-tent
problem.

If this is really the problem trashing everyone's pools, it doesn't
make me feel better because the problem is pretty hard to escape once
you do the slightest meagerly-creative thing with your storage. Even
if the ultimate problem turns out not to be in ZFS, the ZFS camp will
probably have to persecute the many fixes since they're the ones so
unusually vulnerable to it.

also there are worse problems with some USB NAND FLASH sticks
according to Linux MTD/UBI folks:

  http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl

  We have heard reports that MMC and SD cards corrupt and loose data
  if power is cut during writing. Even the data which was there long
  time before may corrupt or disappear. This means that they have bad
  FTL which does not do things properly. But again, this does not have
  to be true for all MMCs and SDs - there are many different
  vendors. But again, you should be careful.

Of course this doesn't apply to any spinning hard drives nor to all
sticks, only to some sticks.

The ubifs camp did an end-to-end test for their filesystem's integrity
using a networked power strip to do automated cord-yanking.  I think
ZFS needs an easier, faster test though, something everyone can do
before loading data into a pool.


pgpj6bBD3y8Mh.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Glenn Lagasse
* Orvar Korvar (knatte_fnatte_tja...@yahoo.com) wrote:
> Seagate7,
> 
> You are not using ZFS correctly. You have misunderstood how it is
> used. If you dont follow the manual (which you havent) then any
> filesystem will cause problems and corruption, even ZFS or ntfs or
> FAT32, etc. You must use ZFS correctly. Start by reading the manual.
> 
> For ZFS to be able to repair errors, you must use two drives or more.
> This is clearly written in the manual. If you only use one drive then
> ZFS can not repair errors. If you use one drive, then ZFS can only
> detect errors, but not repair errors. This is also clearly written in
> the manual.

Or, you can set copies > 1 on your zfs filesystems.  This at least
protects you in cases of data corruption on a single drive but not if
the entire drive goes belly up.

Cheers,

-- 
Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS corruption

2009-02-09 Thread Leonid Roodnitsky
Dear All,

I am receiving DEGRAGED for zpool status -v. 3 out of 14 disks are reported as 
degraded with 'too many errors'. This is Build 99 running on x4240 with STK SAS 
RAID controller. Version of AAC driver is 2.2.5. I am not sure even where to 
start. Any advice is very much appreciated. Trying to convince management that 
ZFS is the way to go and then getting this problem. RAID controller does not 
report any problems with drives. This is RAIDZ (RAID5) zpool. Thank you 
everybody.

Regards,
Leonid
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Orvar Korvar
Seagate7,

You are not using ZFS correctly. You have misunderstood how it is used. If you 
dont follow the manual (which you havent) then any filesystem will cause 
problems and corruption, even ZFS or ntfs or FAT32, etc. You must use ZFS 
correctly. Start by reading the manual.

For ZFS to be able to repair errors, you must use two drives or more. This is 
clearly written in the manual. If you only use one drive then ZFS can not 
repair errors. If you use one drive, then ZFS can only detect errors, but not 
repair errors. This is also clearly written in the manual. And, when you pull 
out a disk, you must use "zpool export" command. This is also clearly written 
in the manual. If you pull out a drive without issuing a warning that you will 
do so (by zpool export) then ZFS will not work.

You are not following the manual, then any software will cause problems. Even 
Windows. You are not using ZFS as it is intended to do. I suggest, in the 
future, you stay with Windows, which you know. If you use Unix without knowing 
it or without reading the manual, then you will have problems. You know 
Windows, stay with Windows.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backup/Restore root pool : SPARC and x86/x64

2009-02-09 Thread Cindy . Swearingen
Hi Gordon,

We are working toward making the root pool recovery process easier
in the future, for everyone. In the meantime, this is awesome work.

After I run through these steps myself, I would like to add this
procedure to the ZFS t/s wiki.

Thanks,

Cindy


Gordon Johnson wrote:
> I hope this thread catches someone's attention. I've reviewed the root pool 
> recovery guide as posted.  It presupposes a certain level of network support, 
> for backup and restore, that many opensolaris users may not have.  
> 
> For an administrator who is working in the context of a data center or a 
> laboratory, where there are multiple systems, including some kind of network 
> attached storage, it works fine.  However, my understanding was that 
> opensolaris was attempting to broaden its user-base by adopting capabilities 
> from the many flavors of linux, and (dare I say it?) even Windows (detailed 
> CIFS integration).  
> 
> The desire to make in-roads into the alternative workstation OS segment 
> implies that Sun and OpenSolaris anticipate more individual, desktop and 
> laptop-based users vice the Solaris previous sole focus on the server 
> segment.  But the desktop and laptop-based crowd often do not have and NFS or 
> other storage server on their network.  Many are simply home-users who want 
> an alternative to Microsoft or who hope to enhance their education in 
> commercial versions of UNIX via home-implementation.  While they likely won't 
> have network storage available, most such users do have resources such as 
> additional hard drives, USB drives, eSATA drives, or USB solid state devices 
> available.
> 
> For their use I threw together a root pool recovery procedure that uses an 
> additional disk attached to the machine requiring backup/recovery.  In 
> developing this procedure the additional disk was attached to a free IDE/SATA 
> port on the motherboard.  However, I believe it will work equally well for an 
> attached USB-disk or solid state memory device.
> 
> PROCEDURE FOLLOWS:
> __
> 
> This instruction assumes that in the system bios is capable of choosing a 
> specific boot device and that the system is X86/64 rather than SPARC.
> It also assumes that there are at least three disk devices available in the 
> system: 
> 1) A rootpool from which to take a backup. In this instruction this disk is 
> c0d1s0.
> 2) An alternate disk which should become the new rootpool. In this 
> instruction this disk is c0d0s0. And to choose this method rather than 
> mirroring the original rootpool implies the the new rootpool disk may be 
> smaller than the old.
> 3) A 'storage' zfs pool that is not the rootpool, nor the new rootpool, but 
> can serve as a storage cache for the system backup in place of a shared 
> network drive, NFS or other.
> 
> These tasks were accomplished all from the first CD in a Solaris 10 
> installation set because all efforts to send the recursive snapshot of the 
> rootpool while the system was in operation resulted in the system hanging.
> 
> Your mileage may vary. However, I believe these instructions to be a safe 
> backup method for average users who work primarily with a single system, and 
> who may not have extensive network storage or other system support.
> 
> These instructions were developed on Solaris 10 U6, October 2008, for an 
> X86-X64 system using an AMD64 processor.
> _
> 1. While booted interactively on your system, as root, perform the following:
> 
>   -Create a zfs storage pool to contain your zfs snapshots (if you don't 
> already have one).
>   EXAMPLE:
>   # zfs create storage/snaps
>   
>   -Create a recursive zfs snapshot of your rootpool using the command, 
> "zfs snapshot -r @
>   EXAMPLE:
>   # zfs snapshot -r rootp...@16012009
>   
>   -Shutdown the system
>   EXAMPLE:
>   # init 0
> 
> 2. Insert or connect the new rootpool disk.
> 3. Insert the Solaris bootable CD or DVD.
> 4. boot cdrom
> 5. Press 3 when prompted for interactive install.
> 6. Press F_2 when prompted.
> 7. Press 'Enter' when prompted.
> 8. When interactive shell has started place mouse cursor in window as 
> indicated and press'Enter'.
> 9. When prompted, place mouse cursor in window, per screen instructions, 
> press '0', then press 'Enter'.
> 10. When interactive install console window appears, minimize it.
> 11. Right click on the desktop and select 'Programs' and the sub-menu 
> 'Terminal...'
> 
> All further instructions should be completed in the terminal.
> _
> 
> # zpool import
>   pool: storage
> id: 2698595696121940384
>  state: ONLINE
> status: The pool was last accessed by another system.
> action: The pool can be imported using its name or numeric identifier and
> the '-f' flag.
>see: ht

[zfs-discuss] Two pools on one slice

2009-02-09 Thread Bernd Schemmer
Hi,

I've a somewhat strange configuration here:

[r...@sol9 Mon Feb 09 21:40:26 ~]
 $ uname -a
SunOS sol9 5.11 snv_107 sun4u sparc SUNW,Sun-Blade-1000

 
[r...@sol9 Mon Feb 09 21:30:50 ~]
 $ zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
rootpool   14.0G   116G63K  /rpool
rootpool/ROOT  7.32G   116G18K  legacy
rootpool/ROOT/snv_107  7.32G   116G  7.15G  /
rootpool/dump  2.00G   116G  2.00G  -
rootpool/export 133K   116G21K  /export
rootpool/export/home 78K   116G42K  /export/home
rootpool/swap 4G   120G  5.34M  -
rootpool/zones  639M   116G20K  /zones
rootpool/zones/dnsserver638M   116G   638M  /zones/dnsserver
rpool  14.0G   116G63K  /rpool
rpool/ROOT 7.32G   116G18K  legacy
rpool/ROOT/snv_107 7.32G   116G  7.15G  /
rpool/dump 2.00G   116G  2.00G  -
rpool/export133K   116G21K  /export
rpool/export/home78K   116G42K  /export/home
rpool/swap4G   120G  5.34M  -
rpool/zones 639M   116G20K  /zones
rpool/zones/dnsserver   638M   116G   638M  /zones/dnsserver


and more in other pools on other disks.

The problem here is that Solaris thinks both pools are on the same disk:

[r...@sol9 Mon Feb 09 21:31:06 ~]
 $  zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
rootpool132G  9.96G   122G 7%  ONLINE  -
rpool   132G  9.96G   122G 7%  ONLINE  -
usbbox001  5.44T   413G  5.03T 7%  ONLINE  -

[r...@sol9 Mon Feb 09 21:37:11 ~]
 $ zpool status rpool
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3t2d0s0  ONLINE   0 0 0

errors: No known data errors

[r...@sol9 Mon Feb 09 21:37:17 ~]
 $ zpool status rootpool
  pool: rootpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rootpoolONLINE   0 0 0
  c3t2d0s0  ONLINE   0 0 0

How can I fix this situation?

Solaris only boots to maintenance because obviously the mount of the local 
filesytems fails:

$ svcadm clear svc:/system/filesystem/local:default
[r...@sol9 Mon Feb 09 21:30:35 ~]
 $ Reading ZFS config: done.
Mounting ZFS filesystems: (1/50)cannot mount '/export': directory is not empty
cannot mount '/export/home': directory is not empty
cannot mount '/rpool': directory is not empty
cannot mount '/zones': directory is not empty
cannot mount '/zones/dnsserver': directory is not empty 
   (50/50)
svc:/system/filesystem/local:default: WARNING: /usr/sbin/zfs mount -a failed: 
exit status 1
Feb  9 21:30:36 svc.startd[7]: svc:/system/filesystem/local:default: Method 
"/lib/svc/method/fs-local" failed with exit status 95.
Feb  9 21:30:36 svc.startd[7]: system/filesystem/local:default failed fatally: 
transitioned to maintenance (see 'svcs -xv' for details)

Argghh  I really do not want to loose my customization for this 
installation .

$ format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c1t1d0 
  /p...@8,70/s...@2,1/s...@1,0
   1. c2t0d0 
  /p...@8,70/p...@3/u...@8,2/stor...@4/d...@0,0
   2. c2t0d1 
  /p...@8,70/p...@3/u...@8,2/stor...@4/d...@0,1
   3. c2t0d2 
  /p...@8,70/p...@3/u...@8,2/stor...@4/d...@0,2
   4. c2t0d3 
  /p...@8,70/p...@3/u...@8,2/stor...@4/d...@0,3
   5. c3t1d0 
  /p...@8,60/SUNW,q...@4/f...@0,0/s...@w2104cf9ff1fa,0
   6. c3t2d0 
  /p...@8,60/SUNW,q...@4/f...@0,0/s...@w2114c3cf7ae3,0

What I've done until now:

Before snv_107 I had snv_89 running on the 36 GB internal disk (using SVM) . 
Then I tried to upgrade to snv_107 using liveupgrade several times but it did 
not work. So I decided to do a new installation on the 146 GB disk and that 
worked (via liveupgrade into an empty boot environment). I could boot the 
snv_107 and use it (including reboots). Today I booted back to snv_89 to copy 
some files from an SVM metadevice and after that booting back into snv_107 
fails..

Any hints are welcome...

regards

Bernd
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk

2009-02-09 Thread Bob Friesenhahn
On Mon, 9 Feb 2009, David Dyer-Bennet wrote:

> Most people run most of their lives with no redundancy in their data,
> though.  If you make sure the backups are up-to-date I don't see any
> serious risk in using the swap-one-disk-at-a-time approach for upgrading a
> home server, where you can have it out of service more easily (or at least
> tell people not to count on anything they write being safe) than in a
> commercial environment.

The risk of some level of failure is pretty high, particularly when 
using large SATA drives.

The typical home user does not do true backups.  Even with true 
backups, the pool needs to be configured from scratch to reproduce 
what was there before.  If they at least copy their files somewhere, 
it can still take a day or two to put the pool back together.

These are good arguments for home users to use raidz2 rather than 
raidz1 so that the risk is dramatically diminished.

> And as I recall, many of the discussions about this technique involve
> people who do not, in fact, have the ability to replicate the entire vdev.
> Often people with 4 hot-swap bays running a 4-disk raidz.

There is almost always a way to add a disk to a system, even if via 
slow USB.  Some people use a USB chassis which will accept the new 
drive, zpool replace the array drive with this new drive, and then 
physically install the new drive in the array.  Zfs export and import 
is required.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backup/Restore root pool : SPARC and x86/x64

2009-02-09 Thread Gordon Johnson
I hope this thread catches someone's attention. I've reviewed the root pool 
recovery guide as posted.  It presupposes a certain level of network support, 
for backup and restore, that many opensolaris users may not have.  

For an administrator who is working in the context of a data center or a 
laboratory, where there are multiple systems, including some kind of network 
attached storage, it works fine.  However, my understanding was that 
opensolaris was attempting to broaden its user-base by adopting capabilities 
from the many flavors of linux, and (dare I say it?) even Windows (detailed 
CIFS integration).  

The desire to make in-roads into the alternative workstation OS segment implies 
that Sun and OpenSolaris anticipate more individual, desktop and laptop-based 
users vice the Solaris previous sole focus on the server segment.  But the 
desktop and laptop-based crowd often do not have and NFS or other storage 
server on their network.  Many are simply home-users who want an alternative to 
Microsoft or who hope to enhance their education in commercial versions of UNIX 
via home-implementation.  While they likely won't have network storage 
available, most such users do have resources such as additional hard drives, 
USB drives, eSATA drives, or USB solid state devices available.

For their use I threw together a root pool recovery procedure that uses an 
additional disk attached to the machine requiring backup/recovery.  In 
developing this procedure the additional disk was attached to a free IDE/SATA 
port on the motherboard.  However, I believe it will work equally well for an 
attached USB-disk or solid state memory device.

PROCEDURE FOLLOWS:
__

This instruction assumes that in the system bios is capable of choosing a 
specific boot device and that the system is X86/64 rather than SPARC.
It also assumes that there are at least three disk devices available in the 
system: 
1) A rootpool from which to take a backup. In this instruction this disk is 
c0d1s0.
2) An alternate disk which should become the new rootpool. In this instruction 
this disk is c0d0s0. And to choose this method rather than mirroring the 
original rootpool implies the the new rootpool disk may be smaller than the old.
3) A 'storage' zfs pool that is not the rootpool, nor the new rootpool, but can 
serve as a storage cache for the system backup in place of a shared network 
drive, NFS or other.

These tasks were accomplished all from the first CD in a Solaris 10 
installation set because all efforts to send the recursive snapshot of the 
rootpool while the system was in operation resulted in the system hanging.

Your mileage may vary. However, I believe these instructions to be a safe 
backup method for average users who work primarily with a single system, and 
who may not have extensive network storage or other system support.

These instructions were developed on Solaris 10 U6, October 2008, for an 
X86-X64 system using an AMD64 processor.
_
1. While booted interactively on your system, as root, perform the following:

-Create a zfs storage pool to contain your zfs snapshots (if you don't 
already have one).
EXAMPLE:
# zfs create storage/snaps

-Create a recursive zfs snapshot of your rootpool using the command, 
"zfs snapshot -r @
EXAMPLE:
# zfs snapshot -r rootp...@16012009

-Shutdown the system
EXAMPLE:
# init 0

2. Insert or connect the new rootpool disk.
3. Insert the Solaris bootable CD or DVD.
4. boot cdrom
5. Press 3 when prompted for interactive install.
6. Press F_2 when prompted.
7. Press 'Enter' when prompted.
8. When interactive shell has started place mouse cursor in window as indicated 
and press'Enter'.
9. When prompted, place mouse cursor in window, per screen instructions, press 
'0', then press 'Enter'.
10. When interactive install console window appears, minimize it.
11. Right click on the desktop and select 'Programs' and the sub-menu 
'Terminal...'

All further instructions should be completed in the terminal.
_

# zpool import
  pool: storage
id: 2698595696121940384
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

storage ONLINE
  mirrorONLINE
c2d0ONLINE
c3d0ONLINE

  pool: rootpool
id: 8060131098876360047
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

rootpoolONLINE
  c0d1s0 

Re: [zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk

2009-02-09 Thread Andrew Gabriel




David Dyer-Bennet wrote:

  On Mon, February 9, 2009 11:48, Bob Friesenhahn wrote:
  
  
On Tue, 10 Feb 2009, Steven Sim wrote:


  I had almost used up all the available space and sought a way to
expand the space without attaching any additional drives.
  

It is good that you succeeded, but the approach you used seems really
risky.  If possible, it is far safer to temporarily add the extra
drive without disturbing the redundancy of your raidz1 configuration.
The most likely time to encounter data loss is while resilvering since
so much data needs to be successfully read.  Since you removed the
redundancy (by intentionally causing a disk "failure"), any read
failure could have lost files, or even the entire pool!

  
  
Most people run most of their lives with no redundancy in their data,
though.  If you make sure the backups are up-to-date I don't see any
serious risk in using the swap-one-disk-at-a-time approach for upgrading a
home server, where you can have it out of service more easily (or at least
tell people not to count on anything they write being safe) than in a
commercial environment.

And as I recall, many of the discussions about this technique involve
people who do not, in fact, have the ability to replicate the entire vdev.
 Often people with 4 hot-swap bays running a 4-disk raidz.
  


If you're going to do this, at least get a clean zpool scrub run
completed (with no checksum errors) before you start. Otherwise you may
well find you have some corrupt blocks in files you hardly ever access
(and so haven't seen), but of course they will be needed to reconstruct
the raidz on the new disk, at which point you are stuffed. Actually,
it's probably a good idea to get a clean zpool scrub between each disk
swap too, in case one of the new disks turns out to be giving errors.

-- 
Andrew


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Max size of log device?

2009-02-09 Thread Robert Milkowski
Hello Andrew,

Sunday, February 8, 2009, 8:46:24 PM, you wrote:

AG> Neil Perrin wrote:
>> On 02/08/09 11:50, Vincent Fox wrote:
>>   
>>> So I have read in the ZFS Wiki:
>>>
>>> #  The minimum size of a log device is the same as the minimum size of 
>>> device in
>>> pool, which is 64 Mbytes. The amount of in-play data that might be stored 
>>> on a log
>>> device is relatively small. Log blocks are freed when the log transaction 
>>> (system call)
>>> is committed.
>>> # The maximum size of a log device should be approximately 1/2 the size of 
>>> physical
>>> memory because that is the maximum amount of potential in-play data that 
>>> can be stored.
>>> For example, if a system has 16 Gbytes of physical memory, consider a 
>>> maximum log device
>>> size of 8 Gbytes. 
>>>
>>> What is the downside of over-large log device?
>>> 
>>
>> - Wasted disk space.
>>
>>   
>>> Let's say I have  a 3310 with 10 older 72-gig 10K RPM drives and RAIDZ2 
>>> them.
>>> Then I throw an entire 72-gig 15K RPM drive in as slog.
>>>
>>> What is behind this maximum size recommendation?
>>> 
>>
>> - Just guidance on what might be used in the most stressed environment.
>> Personally I've never seen anything like the maximum used but it's
>> theoretically possible. 
>>   

AG> Just thinking out loud here, but given such a disk (i.e. one which is 
AG> bigger than required), I might be inclined to slice it up, creating a 
AG> slice for the log at the outer edge of the disk. The outer edge of the
AG> disk has the highest data rate, and by effectively constraining the head
AG> movement to only a portion of the whole disk, average seek times should
AG> be significantly improved (not to mention fewer seeks due to more 
AG> data/cylinder at the outer edge). The log can't be using the write 
AG> cache, so the normal penalty for not using the write cache when not 
AG> giving the whole disk to ZFS is irrelevant in this case. By allocating,
AG> say, a 32GB slice from the outer edge of a 72GB disk, you should get 
AG> really good performance. If you turn out not to need anything like 32GB,
AG> then making it smaller will make it even faster (depending how ZFS 
AG> allocates space on a log device, which I don't know). Obviously, don't
AG> use the rest of the disk in order to achieve this performance.


1. zfs by default will end-up utilizing outer regions of disk drive so
there is no point slicing a lun in this case

2. the log definitely can use cache if it is nv one.

  of course in such a case there is a good question if one 15k disk
  behind 3510 for several 10k disks does make sense at all?

btw: IIRC on 3510 you need to disable cache flushes in zfs and make
sure that a disk array will switch to WT mode if one of a controlers
or batteries fail



-- 
Best regards,
 Robertmailto:mi...@task.gda.pl
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Veritas DMP?

2009-02-09 Thread Robert Milkowski
Hello Andras,

Sunday, February 8, 2009, 12:55:20 PM, you wrote:

AS> Hi,

AS> I'm aware that if we talking about DMP on Solaris the preferred
AS> way is to use MPxIO, still I have a question if any of you got any
AS> experience with ZFS on top of Veritas DMP?

AS> Does it work? Is it supported? Any real life experience/tests in this 
subject?

I haven't tried it but in theory it should just work.

-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk

2009-02-09 Thread David Dyer-Bennet

On Mon, February 9, 2009 11:48, Bob Friesenhahn wrote:
> On Tue, 10 Feb 2009, Steven Sim wrote:
>>
>> I had almost used up all the available space and sought a way to
>> expand the space without attaching any additional drives.
>
> It is good that you succeeded, but the approach you used seems really
> risky.  If possible, it is far safer to temporarily add the extra
> drive without disturbing the redundancy of your raidz1 configuration.
> The most likely time to encounter data loss is while resilvering since
> so much data needs to be successfully read.  Since you removed the
> redundancy (by intentionally causing a disk "failure"), any read
> failure could have lost files, or even the entire pool!

Most people run most of their lives with no redundancy in their data,
though.  If you make sure the backups are up-to-date I don't see any
serious risk in using the swap-one-disk-at-a-time approach for upgrading a
home server, where you can have it out of service more easily (or at least
tell people not to count on anything they write being safe) than in a
commercial environment.

And as I recall, many of the discussions about this technique involve
people who do not, in fact, have the ability to replicate the entire vdev.
 Often people with 4 hot-swap bays running a 4-disk raidz.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for NFS back end

2009-02-09 Thread John Welter
Sorry I wasn't clear that the clients that hit this NFS back end are all
Centos 5.2.  FreeBSD is only used for the current NFS servers (a legacy
deal) but that would go away with the new Solaris/ZFS back end.

Dell will sell their boxes with SAS/5e controllers which are just a LSI
1068 board - these work with the MD1000 as a JBOD (we are doing some
testing as we speak and it seems to work).  

The rest of the infrastructure is Dell so we are trying to stick with
themthe devil we know ;^)

Homework was easy with excellent resources like this listjust lurked
awhile and picked up a lot from the traffic.

Thanks again.

John


-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Monday, February 09, 2009 11:28 AM
To: John Welter
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS for NFS back end

On Mon, 9 Feb 2009, John Welter wrote:
> A bit about the workload:
>
> - 99.999% large reads, very small write requirement.
> - Reads average from ~1MB to 60MB.
> - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
> during peak hours.

This is something that ZFS is particularly good at.

> Proposed hardware:
>
> - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
> - LSI 1068 based SAS cards * 2 per server
> - 4 MD1000 with 1TB ES2's * 15
> - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
> - Intel 10 gig-e to the switching infrastructure

The only concern might be with the MD1000.  Make sure that you can 
obtain it as a JBOD SAS configuration without the advertised PERC RAID 
controller. The PERC RAID controller is likely to get in the way when 
using ZFS. There has been mention here about unpleasant behavior when 
hot-swapping a failed drive in a Dell drive array with their RAID 
controller (does not come back automatically).  Typically such 
simplified hardware is offered as "expansion" enclosures.

Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures.

It seems that you have done your homework well.

> 1) Solaris, OpenSolaris, etc??  What's the best for production?

Choose Solaris 10U6 if OS stability and incremental patches are 
important for you.  ZFS boot from mirrored drives in the PowerEdge 
2970 should help make things very reliable, and the OS becomes easier 
to live-upgrade.

> 3) any other words of wisdom - we are just starting out with ZFS but
do
> have some Solaris background.

You didn't say if you will continue using FreeBSD.  While FreeBSD is a 
fine OS, my experience is that its client NFS read performance is 
considerably less than Solaris.  With Solaris clients and a Solaris 
server, the NFS read is close to "wire speed".  FreeBSD's NFS client 
is not so good for bulk reads, presumably due to its 
read-ahead/caching strategy.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot splitting & joinin

2009-02-09 Thread Blake
I believe Tim Foster's zfs backup service (very beta atm) has support
for splitting zfs send backups.  Might want to check that out and see
about modifying it for your needs.

On Thu, Feb 5, 2009 at 3:15 PM, Michael McKnight
 wrote:
> Hi everyone,
>
> I appreciate the discussion on the practicality of archiving ZFS sends, but 
> right now I don't know of any other options.  I'm a home user, so 
> Enterprise-level solutions aren't available and as far as I know, tar, cpio, 
> etc. don't capture ACL's and other low-level filesystem attributes.   Plus, 
> they are all susceptible to corruption while in storage, making recovery no 
> more likely than with a zfs send.
>
> The checksumming capability is a key factor to me.  I would rather not be 
> able to restore the data than to unknowingly restore bad data.  This is the 
> biggest reason I started using ZFS to start with.  Too many cases of 
> "invisible" file corruption.  Admittedly, it would be nicer if "zfs recv" 
> would flag individual files with checksum problems rather than completely 
> failing the restore.
>
> What I need is a complete snapshot of the filesystem (ie. ufsdump) and, 
> correct me if I'm wrong, but zfs send/recv is the closest (only) thing we 
> have.  And I need to be able to break up this complete snapshot into pieces 
> small enough to fit onto a DVD-DL.
>
> So far, using ZFS send/recv works great as long as the files aren't split.  I 
> have seen suggestions on using something like 7z (?) instead of "split" as an 
> option.  Does anyone else have any other ideas on how to successfully break 
> up a send file and join it back together?
>
> Thanks again,
> Michael
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for NFS back end

2009-02-09 Thread Bob Friesenhahn
On Mon, 9 Feb 2009, John Welter wrote:
> A bit about the workload:
>
> - 99.999% large reads, very small write requirement.
> - Reads average from ~1MB to 60MB.
> - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
> during peak hours.

This is something that ZFS is particularly good at.

> Proposed hardware:
>
> - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
> - LSI 1068 based SAS cards * 2 per server
> - 4 MD1000 with 1TB ES2's * 15
> - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
> - Intel 10 gig-e to the switching infrastructure

The only concern might be with the MD1000.  Make sure that you can 
obtain it as a JBOD SAS configuration without the advertised PERC RAID 
controller. The PERC RAID controller is likely to get in the way when 
using ZFS. There has been mention here about unpleasant behavior when 
hot-swapping a failed drive in a Dell drive array with their RAID 
controller (does not come back automatically).  Typically such 
simplified hardware is offered as "expansion" enclosures.

Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures.

It seems that you have done your homework well.

> 1) Solaris, OpenSolaris, etc??  What's the best for production?

Choose Solaris 10U6 if OS stability and incremental patches are 
important for you.  ZFS boot from mirrored drives in the PowerEdge 
2970 should help make things very reliable, and the OS becomes easier 
to live-upgrade.

> 3) any other words of wisdom - we are just starting out with ZFS but do
> have some Solaris background.

You didn't say if you will continue using FreeBSD.  While FreeBSD is a 
fine OS, my experience is that its client NFS read performance is 
considerably less than Solaris.  With Solaris clients and a Solaris 
server, the NFS read is close to "wire speed".  FreeBSD's NFS client 
is not so good for bulk reads, presumably due to its 
read-ahead/caching strategy.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk

2009-02-09 Thread Bob Friesenhahn
On Tue, 10 Feb 2009, Steven Sim wrote:
> 
> I had almost used up all the available space and sought a way to 
> expand the space without attaching any additional drives.

It is good that you succeeded, but the approach you used seems really 
risky.  If possible, it is far safer to temporarily add the extra 
drive without disturbing the redundancy of your raidz1 configuration. 
The most likely time to encounter data loss is while resilvering since 
so much data needs to be successfully read.  Since you removed the 
redundancy (by intentionally causing a disk "failure"), any read 
failure could have lost files, or even the entire pool!

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [Fwd: Re: [Fwd: RE: Disk Pool overhead]]

2009-02-09 Thread Eric Frank

Hi There,

One of my partners asked the question w.r.t. Disk Pool overhead for the 
7000 series.


Adam Leventhal put that it was very small (1/64)  see below..

Do we have any further info regarding this?

Thanks,
-eric :)



 Original Message 
Subject:Re: [Fwd: RE: Disk Pool overhead]
Date:   Mon, 09 Feb 2009 09:34:28 -0800
From:   Adam Leventhal 
To: Eric Frank 
References: <49905beb.1000...@sun.com>



Hey Eric,

I don't think so. You should ask on zfs-disc...@opensolaris.org.

Adam

On Mon, Feb 09, 2009 at 11:38:03AM -0500, Eric Frank wrote:

Hi Adam,

Is the 1/64th number for ZFS published anywhere else besides your blog?

thanks,
-eric :)

 Original Message 
Subject:RE: Disk Pool overhead
Date:   Fri, 06 Feb 2009 12:58:43 -0500
From:   Tim Brown 
To: eric.fr...@sun.com
CC: ward.eld...@sun.com, Gary Losito 
References: <6e49bcbb8a5deb43bd9174e6a880159003a80...@20mail.cri_corp.com> 
<498c2d63.2020...@sun.com>




Eric,

is that a true number 1/64^th ?



As - is it in a published Sun manual rather than a blog?



Thanks



Tim







*From:* eric.fr...@sun.com [mailto:eric.fr...@sun.com]
*Sent:* Friday, February 06, 2009 7:30 AM
*To:* Tim Brown
*Cc:* ward.eld...@sun.com; Gary Losito
*Subject:* Re: Disk Pool overhead



Hi Tim,

Take a look at this blog entry
http://blogs.sun.com/ahl/entry/sun_storage_7410_space_calculator

at the bottom of the blog there is a note which mentions that ZFS only 
takes up a sliver of space:
*"Update February 4, 2009:* Ryan Matthews  and I 
collaborated  
on a new version of the size calculator that now lists the raw space 
available in TB (decimal as quoted by drive manufacturers for example) as 
well as the usable space in TiB  
(binary as reported by many system tools). The latter also takes account of 
the sliver (1/64th) reserved by ZFS:"


-eric :)

Tim Brown wrote:

Hahaha
The devils in the statistics.

Not having read the article you referenced, 40 percent could be eaten up, 
depending on the snap reserve space. The default is 20 percent I think. I 
know it could be even higher if configured that way. 
So whether its 20, 30, 40 - its got to be more than zfs, since zfs doesn't 
pre-allocate. Different people have asked me, and I just dont know, and 
noone in the class I just took either; - which has led me to you. The Oz of 
Sun.  
Thanks


Tim


- Original Message -
From: ward.eld...@sun.com  
 

To: Tim Brown
Cc: eric.fr...@sun.com   
; Gary Losito

Sent: Thu Feb 05 20:18:37 2009
Subject: Re: Disk Pool overhead

Hi Tim,

From my notes, NetApp shaves off atleast 40% of usable space depending on 
what you are doing.


Check out this blog on HP:

http://www.communities.hp.com/online/blogs/datastorage/archive/2008/11/24/netapp-usable-capacity-going-going-gone.aspx

If you're really bored, Google "netapp usable space" and you can read all  
night...


Ward



Tim Brown wrote:

   On my bberry, but will give it a look when I get to a pc.
 I know that netapp shaves off about 10-20 percent, just 
wondering what the 7000s do

 Ill let you know
 Thanks again
 Tim
   - Original Message -
   From: ward.eld...@sun.com  
  


   To: Tim Brown
   Cc: eric.fr...@sun.com  
  
 ; Gary Losito

   Sent: Thu Feb 05 18:49:17 2009
   Subject: Re: Disk Pool overhead
 Hi Tim,
 Check out the following on the Sun Storage 7000.  It describes 
the

   "usable space" based on
   the different RAID configurations.  Is this more in line with what
   you're looking for ?
 Ward
 Tim Brown wrote:
   >
   > Eric/Ward,
   >
   >
   >
   > With commonly used filesystems such as UFS/VXFS/NTFS, a chunk of 
disk

   > space is reserved off the top for inodes/metadata.
   >
   >
   >
   > With ZFS and the 7000 pools, what is the approximate overhead 
space?

   >
   >
   >
   >
   >
   > Another way to put it, using an example,
   >
   >
   >
   > If I have a single 72GB disk,
   >
   >
   >
   > What will be available in the BUI for available space?
   >
   >
   >
   > Thanks
   >
   >
   >
   > Tim
   >
   >
   >
   --
   * Ward Eldred *
   Systems Engineer
 *Sun Microsystems, Inc.*
   400 Atr

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Bob Friesenhahn
On Mon, 9 Feb 2009, D. Eckert wrote:
>
> A good practice would be to care first for a proper documentation. 
> There's nothing stated in the man pages, if USB zpools are used, 
> that the zfs mount/unmount is NOT recommended and zpool export 
> should be used instead.

I have been using USB mirrored disks for backup purposes for about 
eight months now.  No data loss, or even any reported uncorrectable 
read failures.  These disks have been shared between two different 
systems (x86 and SPARC).  The documentation said that I should use zfs 
export/import and so that is what I have done, with no problems.

While these USB disks seem to be working reliably, it is 
certainly possible to construct a USB arrangement which does not work 
reliably since most USB hardware is cheap junk.  My USB disks are 
direct attached and don't go through a USB bridge.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk

2009-02-09 Thread Steven Sim




All;

There's been some negative post about ZFS recently.

I've been using ZFS for more than 13 months now, my system's gone
through 3 major upgrades, one critical failure and the data's still
fully intact.

I am thoroughly impressed with ZFS. In particular, it's sheer
reliability.

As for flexibility, I am also impressed but am of the opinion that it
could do with some improvement here. (like allowing LUNS to be
removed..)

Recently, I had at my place a ZFS pool consisting of 4 x 320 Gbyte SATA
II drives in a RAID-Z configuration.

I had almost used up all the available space and sought a way to expand
the space without attaching any additional drives.

r...@sunlight:/root# zfs list myplace
NAME  USED  AVAIL  REFER  MOUNTPOINT
myplace   874G  4.05G  28.4K  /myplace

I bought 4 x 1 TB SATA II drives and deliberately replace one of the
older 320Gbyte drive with a 1 TB drive.

r...@sunlight:/root# zpool status -v myplace
  pool: myplace
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

    NAME    STATE READ WRITE CKSUM
    myplace DEGRADED 0 0 0
      raidz1    DEGRADED 0 0 0
        c4d0    ONLINE   0 0 0
        c5d0    ONLINE   0 0 0
        c6d0    ONLINE   0 0 0
        c7d0    UNAVAIL  0   220 0  cannot open

Yup! As expected, ZFS reported c7d0 as faulted so ..

r...@sunlight:/root# zpool replace myplace c7d0

r...@sunlight:/root# zpool status -v myplace   
  pool: myplace
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool
will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 328h41m to go
config:

    NAME    STATE READ WRITE CKSUM
    myplace DEGRADED 0 0 0
      raidz1    DEGRADED 0 0 0
        c4d0    ONLINE   0 0 0  260K resilvered
        c5d0    ONLINE   0 0 0  252K resilvered
        c6d0    ONLINE   0 0 0  260K resilvered
        replacing   DEGRADED 0 0 0
      c7d0s0/o  FAULTED  0   972 0  corrupted data
      c7d0  ONLINE   0 0 0  418K resilvered

errors: No known data errors

Almost three hours later..

r...@sunlight:/root# zpool status -v myplace
  pool: myplace
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool
can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
 scrub: resilver completed after 2h53m with 0 errors on Fri Feb  6
19:32:35 2009
config:

    NAME    STATE READ WRITE CKSUM
    myplace ONLINE   0 0 0
      raidz1    ONLINE   0 0 0
        c4d0    ONLINE   0 0 0  171M resilvered
        c5d0    ONLINE   0 0 0  171M resilvered
        c6d0    ONLINE   0 0 0  168M resilvered
        c7d0    ONLINE   0 0 0  292G resilvered

errors: No known data errors

Subsequently I did a ZFS upgrade..

r...@sunlight:/root# zpool upgrade
This system is currently running ZFS pool version 13.

The following pools are out of date, and can be upgraded.  After being
upgraded, these pools will no longer be accessible by older software
versions.

VER  POOL
---  
10   myplace

Use 'zpool upgrade -v' for a list of available versions and their
associated
features.

r...@sunlight:/root# zpool upgrade myplace
This system is currently running ZFS pool version 13.

Successfully upgraded 'myplace' from version 10 to version 13

Ok, upgrade successful.

Subsequently I replaced the next drive, waited for ZFS to completely
resilvered the replaced 1 TB drive and repeated the sequence until I
completely replaced all 320 Gbyte drive with 1 Tbyte drives

And after replacing and completely resilvering the 4th drive

(output shows the 4th drive being resilvered..)

r...@sunlight:/root# zpool status -v myplace
  pool: myplace
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool
will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h17m, 5.15% done, 5h16m to go
config:

    NAME    STATE READ WRITE CKSUM
    myplace DEGRADED 0 0 0
      raidz1    DEGRADED 0 0 0
        replacing   DEGRADED 0 0 0
      c4d0s0/o  FAULTED  0 6.35K 0  corrupted data
      c4d0  ONLINE   0 0 0  15.0G resilvered
        c5d0    ONLINE   0 0  

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Andrew Gabriel
Kyle McDonald wrote:
> D. Eckert wrote:
>   
>> too many words wasted, but not a single word, how to restore the data.
>>
>> I have read the man pages carefully. But again: there's nothing said, that 
>> on USB drives zfs umount pool is not allowed.
>>   
>> 
> It is allowed. But it's not enough. You need to read both the 'zpool ' 
> and 'zfs' manpages. the 'zpool' manpage will tell you that the way to 
> move the 'whole pool' to another machine is to run 'zpool export 
> '. The 'zpool export' will actually run the 'zfs umount' for 
> you, though it's not a problem if it's already been done.
>
> Note, this isn't USB specific, you won't see anything in the docs about 
> USB. This condition applies to SCSI and others too. You need to export 
> the pool to move it to another machine. If the machine crashed before 
> you could export it, 'zpool import -f' on the new machine can help 
> import it anyway.
>
> With USB, there are probably other commands you'll also need to use to 
> notify Solaris that you are going to unplug the drive, Just like the 
> 'Safely remove hardware' tool on windows. Or you need to remove it only 
> when the system is shut down. These commands will be documented 
> somewhere else, not in the ZFS docs because they don't apply to just ZFS.
>   

That would be cfgadm(1M).
It's also used for hot-swapable SATA drives (and probably other things).

-- 
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread David Champion
> too many words wasted, but not a single word, how to restore the data.
>
> I have read the man pages carefully. But again: there's nothing said,
> that on USB drives zfs umount pool is not allowed.

You misunderstand.  This particular point has nothing to do with USB;
it's the same for any ZFS environment.  You're allowed to do a zfs
umount on a filesystem, there's no problem with that.  But remember
that ZFS is not just a filesystem, in the way that reiserfs and UFS are
filesystems.  It's an integrated storage pooling system and filesystem.
When you umount a filesystem, you're not taking any storage offline,
you're just removing the filesystem's presence on the VFS hierarchy.

You umounted a zfs filesystem, not touching the pool, then removed
the device.  This is analogous to preparing an external hardware RAID
and creating one or more filesystems, using them a while, umounting
one of them, and powering down the RAID.  You did nothing to protect
other filesystems or the RAID's r/w cache.  Everything on the RAID
is now inconsistent and suspect.  But since your "RAID" was a single
striped volume, there's no mirror or parity information with which to
reconstruct the data.

ZFS is capable of detecting these problems, where other filesystems are
often not.  But no filesystem can tell what the data should have been
when the only copy of the data is damaged.

This is documented in ZFS.  It's not about USB, it's just that USB
devices can be more vulnerable to this kind of treatment than other
kinds of storage are.

> And again: Why should a 2 weeks old Seagate HDD suddenly be damaged,
> if there was no shock, hit or any other event like that?

It happens all the time.  We just don't always know about it.

-- 
 -D.d...@uchicago.eduNSITUniversity of Chicago
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Kyle McDonald
D. Eckert wrote:
> too many words wasted, but not a single word, how to restore the data.
>
> I have read the man pages carefully. But again: there's nothing said, that on 
> USB drives zfs umount pool is not allowed.
>   
It is allowed. But it's not enough. You need to read both the 'zpool ' 
and 'zfs' manpages. the 'zpool' manpage will tell you that the way to 
move the 'whole pool' to another machine is to run 'zpool export 
'. The 'zpool export' will actually run the 'zfs umount' for 
you, though it's not a problem if it's already been done.

Note, this isn't USB specific, you won't see anything in the docs about 
USB. This condition applies to SCSI and others too. You need to export 
the pool to move it to another machine. If the machine crashed before 
you could export it, 'zpool import -f' on the new machine can help 
import it anyway.

With USB, there are probably other commands you'll also need to use to 
notify Solaris that you are going to unplug the drive, Just like the 
'Safely remove hardware' tool on windows. Or you need to remove it only 
when the system is shut down. These commands will be documented 
somewhere else, not in the ZFS docs because they don't apply to just ZFS.
> So how on earth should a simple user know that, if he knows that filesystems 
> properly unmounted using the umount cmd??
>   
You need to understand that the filesystems are all contained in a 
'pool' (more than one filesystem can share the disk space in in the same 
pool). Unmounting the filesystem *does not* prepare the *pool* to be 
moved from one machine to another.
> And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there 
> was no shock, hit or any other event like that?
>   
Who knows? Some harddrives are manufactured with problems. Remember that 
ZFS is designed to catch problems that even the ECC on the drive doesn't 
catch. So it's not impossible for it to catch errors even the 
manufacturer's QA tests missed.
> It is of course easier to blame the stupid user instead of having proper 
> documentation and emergency tools to handle that.
>
>   
I beleive that between the man pages, the administration docs on the 
web, the best practices pages, and all the other blogs and web pages, 
that ZFS is documented well enough. It's not like other filesystems, so 
there is more to learn, and you need to review all the docs, not just 
the ones that cover the operations (like unmount) that you're familiar 
with. Understanding pools (and the commands that manage pools,) is also 
important. Man pages and command references are good when you understand 
the architecture and need to learn about the details of a command you 
know you need to use. It's the other documentation that will fill you in 
you on how the system parts work together, and advise you on the best 
way to setup or do what you want.

As I said in my other email ZFS can't repair errors without a way to 
reconstruct the data. It needs mirroring, parity (or the copies=x 
setting) to be able to repair the data. By setting up a pool with no 
redundancy. So your email subject line is a little backwards, since any 
'professional' usage would incorporate redundancy (Mirror, Parity, etc.) 
What you're trying to do is more 'home/hobbiest' usage. Though most 
home/hobbiest users decide to incorporte redundancy for any data they 
really care about.
> The list of malfunctions of SNV Builts gets longer and longer with every 
> version released.
>
>   
I'm sure new things are added every release, but many are also fixed. 
sNV is pre-release software after all. Overall the problems found aren't 
around long, and I beleive the list gets shorter as often as it gets 
longer. If you want production level Solaris, ZFS is available in 
solaris 10.
> e. g. on SNV 107
>
> - installation script is unable to write properly the boot blocks for grub
> - you choose German locale, but have an American Keyboard style in the gnome 
> (since SNV 103) 
> - in SNV 107 adding these lines to xorg.conf:
>
> Option "XkbRules" "xorg"
> Option "XkbModel" "pc105"
> Option "XkbLayout" "de"
>
> (was working in SNV 103)
>
> lets crash the Xserver.
>
> - latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn't work with 
> OpenSolaris SNV 107
> - nwam and iwk0: not solved, no DHCP responses
>
>   
Yes there was a major update of the X server sources to catch up to the 
latest(?) X.org release. Workarounds are known, and I bet this will be 
working again in b108 (or not long after.)
> it seems better, to stay focused on having a colourfull gui with hundreds of 
> functions no one needs instead providing a stable core.
>
>   
The core of solaris is much more stable than anythign else I've used. 
The windowing system is not a part of the core of an operatinog system 
in my book.
> I am looking forward the day booting OpenSolaris and see a greeting Windows 
> XP Logo surrounded by the blue bubbles of OpenSolaris.
>
>   


Note that sNV (aka SXC

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Uwe Dippel
Full of sympathy, I still feel you might as well relax a bit.
It is the XkbVariant that starts X without any chance to return.
But look at the many "boot stops after the third line", and from my side, the 
not working network settings, even without nwam. 
The worst part was a so-called engineer stating that one simply can't expect a 
host to connect to the same gateway through 2 different paths properly.

But it would be wrong to admonish the individuals, and my excuses to those I 
treated with contempt. 
The problem cannot be solved in this forum. The issue needs to be addressed 
elsewhere. When adoption (migration) is the objective, in the first place the 
kernel needs to boot, whatever the hardware, even if graceful degradation is 
unavoidable. Second, a network setting must be possible, and not simply doing 
nothing, or requiring a dead NIC to be added just to boot. As much as I was 
grateful to be helped, of course an X server needs to fall back to sane 
behaviour at all times. And sendmail loses mail. All this is sick. But 
priorities need to come from managers, or the community, not from the coders. 
In OpenSolaris SUN insists on calling the shots, so it will be managers in this 
case. I myself am very unhappy with ZFS; not because it had failed me, but for 
a third party, cold-eyes review, the man page and the concept and (arcane) 
commands by now surpasses by far the sequence of logical steps to partition 
(fdisk) and format (newfs) a drive. Pools, tanks, scrubs, imports, exports
  and whatnot; I don't think this was the original intention. And - as bad as 
the network engineer further up - is the statement on 'USB hard disk not 
suitable for ZFS' or similar. 
Do not get me wrong, OpenSolaris is still my preferred Desktop, I love its 
stability, and - laugh at me - it is the only one that always allows to kill an 
application gone sour (Ubuntu usually fails here). I consider it elegant and 
helpful with my daily work. *If* it is up, *if* it boots. Alas, this is by far 
the more difficult part. And here I agree with you: USB hard disks need a 
proper, clear, way to be attached and removed, without even exceeding the old 
way of mount-umount. 
Try to run a hard disk test. Let us also compare here: I never lost an 
ext3-drive that would pass the hardware test. On the contrary, at times I could 
recover data from one that failed. But let us introduce as measure the former 
one: As long as the drive is not flagged 'corrupt' by the disk test utility, it 
surely must not lose any data (aside from 'rm'). My honest and curious 
question: Does ZFS pass this test?

Uwe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Casper . Dik

>too many words wasted, but not a single word, how to restore the data.
>
>I have read the man pages carefully. But again: there's nothing said, that on 
>USB drives zfs umount pool is not allowed.

You cannot unmount a pool.

You can only unmount a filesystem.

That the default name of the pool's filesystem is the same as the name of
the pool is an artifact of the implementation.

Surely, you can unmount the filesystem.

That is not illegal.

But you've removed a live pool WITHOUT exporting it.

I can understand that you make that mistake because you take what you know
from other filesystems and you apply that to ZFS.


>So how on earth should a simple user know that, if he knows that filesystems
>properly unmounted using the umount cmd??

Reading the documentation.  The zpool and zfs commands are easy to use and 
perhaps this stops you and others from reading, e.g.,

http://docs.sun.com/app/docs/doc/819-5461/gavwn?l=en&a=view

And before you use ZFS you must understand some of the basic concepts; 
rather than having a device which you can mount with "mount", you have a
"pool" and that "pool" is owned by the system.


>
>And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there
>was no shock, hit or any other event like tht?

If you removed the device from a live pool and moved it on another system 
and them moved it back, then, yes, you could have problems.

You I'd suppose that the system shouldn't go online and requiring an 
import (-f).

>
>It is of course easier to blame the stupid user instead of having proper
>documentation and emergency tools to handle that.

The document explains that must use export in order to remove pools from 
one system to another.

I'm not sure how the system can prevent that; there's no "lock" on your 
USB slots.



As for the other problems with nv107, each time we change a lot of 
software; and sometimes we change important parts of the sofware; e.g.,
in nv107 we changed to a newer version of Xorg.

The cutting edge builds vary in quality.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Christian Wolff
First: It sucks to loose data. That's very uncool...BUT

I don't know how ZFS should be able to recover data with no mirror to copy 
from. If you have some kind of a RAID level you're easily able to recover your 
data. I saw that several times. Without any problems and even with nearly no 
performance impact on a productive machine.

No offense. But you must admit that you flame on a filesystem without even 
knowing the right commands and blame us to not recover your data?! C'mon!

Regards,
Chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for NFS back end

2009-02-09 Thread John Welter
Hi everyone,

We are looking at ZFS to use as the back end to a pool of java servers
doing image processing and serving this content over the internet.

Our current solution is working well but the cost to scale and ability
to scale is becoming a problem.

Currently:

- 20TB NFS servers running FreeBSD
- Load balancer in front of them

A bit about the workload:

- 99.999% large reads, very small write requirement.
- Reads average from ~1MB to 60MB.
- Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
during peak hours.

Proposed hardware:

- Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
- LSI 1068 based SAS cards * 2 per server
- 4 MD1000 with 1TB ES2's * 15
- Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
- Intel 10 gig-e to the switching infrastructure

Questions:

1) Solaris, OpenSolaris, etc??  What's the best for production?

2) Anything wrong with the hardware we selected?

3) any other words of wisdom - we are just starting out with ZFS but do
have some Solaris background.

Thanks!

John
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Kyle McDonald
Hi Dave,

Having read through the whole thread, I think there are several things 
that could all be adding to your problems.
At least some of which are not related to ZFS at all.

You mentioned the ZFS docs not warning you about this, and yet I know 
the docs explictly tell you that:

1. While a ZFS pool that has no redundancy (Mirroring or Parity,) like 
your's is missing, can still *detect* errors in the data read from the 
drive, it can't *repair* those errors. Repairing errors requires that 
ZFS be performing (at least) the (top-most level of) Mirroring or Parity 
functions. Since you have no Mirroring or Parity ZFS cannot 
automatically recover this data.

2. As others have said, a zpool can contain many filesystems. 'zfs 
umount' only unmounts a single filesystem. Removing a full pool from a 
machine requires a 'zpool export' no matter what disk technology is 
being used (USB, SCSI, SATA, FC, etc.)  On the new system you would use 
'zpool import' to bring the pool into the new system.

I'm sure this next on is documented by Sun also though not in the ZFS 
docs, probably in some other part of the system dealing with removable 
devices:

3. In addition, according to Casper's message you need to 'off-line' USB 
(and probasbly other types too) storage in Solaris (Just like in 
Windows) before pulling the plug. This has nothing to do with ZFS. This 
will have corrupted (possibly even past the point of repair most other 
filesystems also.

Still, I had an idea on something you might try. I don't know how long 
it's been  since you pulled the drive, or what else you've done since.

Which machine is reporting the errors you've shown us? The machine you 
pulled the drives from? or the machine you moved them too? Were you 
successful in 'zpool import' the pool into the other machines? This idea 
might work either way, but if you haven't successfully immported it into 
another machine there's probably more of a chance.

If the output is from the machine you pulled them out of, then basically 
that machine still thinks the pool is connected to it, and it thinks the 
one and only disk in the pool is now not responding. In this case the 
errors you see in the tables are the errors from trying to contact a 
drive that no longer exists.

Have you reconnected the disk to the original machine yet? If not I'd 
attempt a 'zpool export' now (though that may not work.) and then shut 
the machine down fully, and connect the disk. Then boot it all up. 
Depending on what you've tried to do with this disk to fix the problem 
since it happened I have no idea exactly how the machine will come up.

If you couldn't do the 'zpool' export, then the machine will try to 
mount the FS's in the pool on boot. This may nor may not work.
If you were successful in doing the export with the disks disconnected, 
then it won't try, and you'll need to 'zpool import' them after the 
machine is booted.

Depending on how the import goes, you might still see errors in the 
'zpool status' output. If so, I know a 'zpool clear' will clear those 
errors, and I doubt it can make the situation any worse than it is now. 
You'd have to give us info about what the machine tells you after this 
before I can advise you more. But (and the experts can correct me if I'm 
wrong) this might 'just work(tm)'.

My theory here is that the ZFS may have been successful in keeping the 
state of the (meta)data on the disk consistent after all. The checksum 
and I/O errors listed may be from ZFS trying to access the non-existent 
drive after you removed it. Which (in theory) are all bogus errors, and 
don't really point to errors in the data on the drive.

Of course there are many things that all have to be true for this theory 
to turn out to be true. Depending on what has happened to the machines 
and the disks since they were originally unplugged from each other, all 
bets might be off. And then there's the possibility that the my idea 
never could work at all. People much more expert than I can chime in on 
that.

  -Kyle




D. Eckert wrote:
> Hi,
>
> after working for 1 month with ZFS on 2 external USB drives I have 
> experienced, that the all new zfs filesystem is the most unreliable FS I have 
> ever seen.
>
> Since working with the zfs, I have lost datas from:
>
> 1 80 GB external Drive
> 1 1 Terrabyte external Drive
>
> It is a shame, that zfs has no filesystem management tools for repairing e. 
> g. being able to repair those errors:
>
>NAMESTATE READ WRITE CKSUM
> usbhdd1 ONLINE   0 0 8
>   c3t0d0s0  ONLINE   0 0 8
>
> errors: Permanent errors have been detected in the following files:
>
> usbhdd1: 0x0
>
>
> It is indeed very disappointing that moving USB zpools between computers ends 
> in 90 % with a massive loss of data.
>
> This is to the not reliable working command zfs umount , even if 
> the output of mount shows you, that the pool is no longer mounted and ist 
> removed from mntab

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread D. Eckert
too many words wasted, but not a single word, how to restore the data.

I have read the man pages carefully. But again: there's nothing said, that on 
USB drives zfs umount pool is not allowed.

So how on earth should a simple user know that, if he knows that filesystems 
properly unmounted using the umount cmd??

And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there 
was no shock, hit or any other event like that?

It is of course easier to blame the stupid user instead of having proper 
documentation and emergency tools to handle that.

The list of malfunctions of SNV Builts gets longer and longer with every 
version released.

e. g. on SNV 107

- installation script is unable to write properly the boot blocks for grub
- you choose German locale, but have an American Keyboard style in the gnome 
(since SNV 103) 
- in SNV 107 adding these lines to xorg.conf:

Option "XkbRules" "xorg"
Option "XkbModel" "pc105"
Option "XkbLayout" "de"

(was working in SNV 103)

lets crash the Xserver.

- latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn't work with 
OpenSolaris SNV 107
- nwam and iwk0: not solved, no DHCP responses

it seems better, to stay focused on having a colourfull gui with hundreds of 
functions no one needs instead providing a stable core.

I am looking forward the day booting OpenSolaris and see a greeting Windows XP 
Logo surrounded by the blue bubbles of OpenSolaris.

Cheers,

D.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Casper . Dik

>James,
>
>on a UFS ore reiserfs such errors could be corrected.

That's not true.  That depends on the nature of the error.

I've seen quite a few problems on UFS with corrupted file contents;
such filesystems always are "clean".  Yet the filesystems are corrupted.
And no tool can fix those filesystems.

>It is grossly negligent to develop a file system without proper repairing 
>tools.

Repairing to what state?

One of the reasons why there's a "ufs fsck" is because its disk state is
nearly always "corrupted".  The log only allows you to repair the metadata,
NEVER the data.  And I've seen the corrupted files many times.
(Specifically, when you upgrade a driver and it's buggy, you would 
typically have a broken driver_alias, name_to_major, etc.  Though I added
a few fsyncs in update_drv and ilk and it is better, fsck does not
"fix" UFS filesystems.

Fsck can only repair known faults; known discrepancies in the meta data.
Since ZFS doesn't have such known discrepancies, there's nothing to repair.

>More and more becomes clear, that it just was a marketing slogan by Sun to 
>state,
>that ZFS does not use any repairing tools due to healing itself.

If it can repair, then it does.  But if you only have one copy of the data,
then you cannot repair the data missing.

>In this particular case we are talking about a a loss of at least 35 GB ()
>of data.


>A good practice would be to care first for a proper documentation. There's
>nothing stated in the man pages, if USB zpools are used, that the zfs
>mount/unmount is NOT recommended and zpool export should be used instead.

You have a live pool and you yank it out of the system?  Where does it say
that you can do that? 

>Having facilities to override checksumming to get even an as corrupted tagged 
>pool mounted to rescue the data shouldn't be just a dream. it should be a MUST 
>TO HAVE.

Depends on how much of the data is corrupted and which parts they are.

>I agree, it is always - regardless the used FSType - to have a proper backup
>facility in place. But based on the issues ZFS was designed for - for very big
>pools - it becomes as well a cost aspect.
>
>And as well it would be a good practice by Sun due to having internet boards
>full of complaining people loosing data just because of using zfs, to CARE FOR 
>THAT.

I've not seen a lot of people who complained; or perhaps I don't look 
carefully (I'm not in ZFS development)

What I have seen is some issues with weird BIOS issues (taking part of a 
disk); connecting a zpool to different systems at the same time, including 
what you may have done by having the zpool "imported" on both systems.

Casper



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Jürgen Keil
> bash-3.00# zfs mount usbhdd1
> cannot mount 'usbhdd1': E/A-Fehler
> bash-3.00#

Why is there an I/O error?

Is there any information logged to /var/adm/messages when this
I/O error is reported?  E.g. timeout errors for the USB storage device?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread MC
> on a UFS ore reiserfs such errors could be corrected.

I think some of these people are assuming your hard drive is broken.  I'm not 
sure what you're assuming, but if the hard drive is broken, I don't think ANY 
file system can do anything about that.

At best, if the disk was in a RAID 5 array, and the other disks worked, then 
the parity from the working disks could correct the broken data on the broken 
drive... but you only have a single disk, not a mirror or a raid 5, so this fix 
can't be done...

I think this might be a case of zfs reporting errors that other file systems 
don't notice.  Your hard drive might have been broken for months without you 
knowing it until now.  In that case the errors aren't the fault of zfs.  It is 
the fault of the broken drive, and the fault of the other file systems for not 
knowing when data is corrupted.  See what I mean?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread D. Eckert
James,

on a UFS ore reiserfs such errors could be corrected.

It is grossly negligent to develop a file system without proper repairing tools.

More and more becomes clear, that it just was a marketing slogan by Sun to 
state, that ZFS does not use any repairing tools due to healing itself.

In this particular case we are talking about a a loss of at least 35 GB () 
of data.

And as long as the ZFS Developer are more focused on proven wrong marketing 
aspects I can't recommend ZFS at all in a professional area and I am thinking 
about to make this issue clear on
the Sun Conference we have in Germany in March this year.

It is not a good practice just to make someone believe who just lost that mass 
of data "I am sorry, but I can't help you." Even in the fact, if you don't 
understand why it happened.

A good practice would be to care first for a proper documentation. There's 
nothing stated in the man pages, if USB zpools are used, that the zfs 
mount/unmount is NOT recommended and zpool export should be used instead.

Having facilities to override checksumming to get even an as corrupted tagged 
pool mounted to rescue the data shouldn't be just a dream. it should be a MUST 
TO HAVE.

I agree, it is always - regardless the used FSType - to have a proper backup 
facility in place. But based on the issues ZFS was designed for - for very big 
pools - it becomes as well a cost aspect.

And as well it would be a good practice by Sun due to having internet boards 
full of complaining people loosing data just because of using zfs, to CARE FOR 
THAT.

Regards,

DE
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread James C. McPherson
On Mon, 09 Feb 2009 03:10:21 -0800 (PST)
"D. Eckert"  wrote:

> ok, so far so good.
> 
> but how can I get my pool up and running

I can't help you with this bit


> bash-3.00# zpool status -xv usbhdd1
>   Pool: usbhdd1
>  Status: ONLINE
> Zustand: Auf mindestens einem Gerät ist ein Fehler aufgetreten, der
> eine Datenbeschädigung bewirkt hat.  Möglicherweise sind davon
> Anwendungen betroffen. Aktion: Stellen Sie die betreffende Datei wenn
> möglich wieder her.  Anderenfalls stellen Sie den gesamten Pool aus
> einer Sicherung wieder her. Siehe: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: Keine erforderlich
> config:
> 
> NAMESTATE READ WRITE CKSUM
> usbhdd1 ONLINE   0 016
>   c3t0d0s0  ONLINE   0 016
> 
> errors: Permanent errors have been detected in the following files:
> 
> usbhdd1:<0x0>
> 
> bash-3.00# zpool list
> NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
> storage  48,8G  16,3G  32,4G33%  ONLINE  -
> usbdrv1   484M  2,79M   481M 0%  ONLINE  -
> usbhdd1  74,5G  34,3G  40,2G46%  ONLINE  -
> 
> I don't understand, that I get status information about the pool, e.
> g. cap, size, health, but I can not mount it to the system:
> 
> bash-3.00# zfs mount usbhdd1
> cannot mount 'usbhdd1': E/A-Fehler
> bash-3.00#

You have checksum errors on a non-replicated pool. This
is not something that can be ignored. 


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread D. Eckert
ok, so far so good.

but how can I get my pool up and running

Following output:

bash-3.00# zfs get all usbhdd1
NAME PROPERTY VALUE SOURCE
usbhdd1  type filesystem-
usbhdd1  creation Do Dez 25 23:36 2008  -
usbhdd1  used 34,3G -
usbhdd1  available39,0G -
usbhdd1  referenced   34,3G -
usbhdd1  compressratio1.00x -
usbhdd1  mounted  no-
usbhdd1  quotanone  default
usbhdd1  reservation  none  default
usbhdd1  recordsize   128K  default
usbhdd1  mountpoint   /usbhdd1  default
usbhdd1  sharenfs off   default
usbhdd1  checksum onlocal
usbhdd1  compression  off   default
usbhdd1  atimeondefault
usbhdd1  devices  ondefault
usbhdd1  exec ondefault
usbhdd1  setuid   ondefault
usbhdd1  readonly off   default
usbhdd1  zonedoff   default
usbhdd1  snapdir  hiddendefault
usbhdd1  aclmode  groupmask default
usbhdd1  aclinherit   restricteddefault
usbhdd1  canmount ondefault
usbhdd1  shareiscsi   off   default
usbhdd1  xattrondefault
usbhdd1  copies   1 default
interner Fehler: unable to get version property
interner Fehler: unable to get utf8only property
interner Fehler: unable to get normalization property
interner Fehler: unable to get casesensitivity property
usbhdd1  vscanoff   default
usbhdd1  nbmand   off   default
usbhdd1  sharesmb off   default
usbhdd1  refquota none  default
usbhdd1  refreservation   none  default

bash-3.00# zpool status -xv usbhdd1
  Pool: usbhdd1
 Status: ONLINE
Zustand: Auf mindestens einem Gerät ist ein Fehler aufgetreten, der eine
Datenbeschädigung bewirkt hat.  Möglicherweise sind davon Anwendungen 
betroffen.
Aktion: Stellen Sie die betreffende Datei wenn möglich wieder her.  
Anderenfalls stellen Sie den
gesamten Pool aus einer Sicherung wieder her.
   Siehe: http://www.sun.com/msg/ZFS-8000-8A
 scrub: Keine erforderlich
config:

NAMESTATE READ WRITE CKSUM
usbhdd1 ONLINE   0 016
  c3t0d0s0  ONLINE   0 016

errors: Permanent errors have been detected in the following files:

usbhdd1:<0x0>

bash-3.00# zpool list
NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
storage  48,8G  16,3G  32,4G33%  ONLINE  -
usbdrv1   484M  2,79M   481M 0%  ONLINE  -
usbhdd1  74,5G  34,3G  40,2G46%  ONLINE  -

I don't understand, that I get status information about the pool, e. g. cap, 
size, health, but I can not mount it to the system:

bash-3.00# zfs mount usbhdd1
cannot mount 'usbhdd1': E/A-Fehler
bash-3.00#

any suggestion for help?

thanks and regards,

dave.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Casper . Dik


>Well, umount is not the "right" way to do it, so he'd be simulating a
>power-loss/system-crash. That still doesn't explain why massive data loss
>would occur ? I would understand the last txg being lost, but 90% according
>to OP ?!


On USB or? I think he was trying to properly unmount the USB devices.

One of the known issues with USB devices is that they may not properly 
work; for a typical disk, it will properly "flush write cache" when it
is instructed to do so.

However, when you connect the devices using a USB controller and a USB
enclosure, we're less certain that "flush write cache" will make it
to the drive, because:
- was a command send to the enclosure (e.g., if you needed to configure
  the device with "reduced-cmd-support=true", then all bets are 
  off)
- when the enclosure responds, did it send a "flush write cache"
  to the disk?
- and when it responds, did it wait until the disk completed the
  command?

It is one of the reasons why I'd recommend against USB for disks.  Too 
many variables.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Ahmed Kamal
>
> "Unmount" is not sufficient.
>

Well, umount is not the "right" way to do it, so he'd be simulating a
power-loss/system-crash. That still doesn't explain why massive data loss
would occur ? I would understand the last txg being lost, but 90% according
to OP ?!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Casper . Dik

>However, the hardware used is:
>
>1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
>1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
>1 Sony VGN-NR38Z
>
>Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80 GB.

(Is that the Trekstore with 2x500GB)

>The harddrives used are brand new, as well the Sony notebook.
>
>Even if I did zfs umount poolname I waited for 30 sec. and then unplugged, 
>data corruption occurs.

Did you EXPORT the pool?

"Unmount" is not sufficient.

You need to use:
zpool export poolname

How exactly did you remove the disk from the 280R?

And what exact problem did you get?

You need to "off-line" the disk before actually removing it, physically.


Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Ian Collins
D. Eckert wrote:
> Hi Caspar,
>
> thanks for you reply.
>
> I completely disagreed to your opinion, that is USB. And seems as well, that 
> I am not the only one having this opinion regarding ZFS.
>
> However, the hardware used is:
>
> 1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
> 1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
> 1 Sony VGN-NR38Z
>
> Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80 GB.
>
> The harddrives used are brand new, as well the Sony notebook.
>
> Even if I did zfs umount poolname I waited for 30 sec. and then unplugged, 
> data corruption occurs.
>
>   
You don't zfs umount poolname, you zpool export it.

-- 
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread D. Eckert
Hi Caspar,

thanks for you reply.

I completely disagreed to your opinion, that is USB. And seems as well, that I 
am not the only one having this opinion regarding ZFS.

However, the hardware used is:

1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
1 Sony VGN-NR38Z

Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80 GB.

The harddrives used are brand new, as well the Sony notebook.

Even if I did zfs umount poolname I waited for 30 sec. and then unplugged, data 
corruption occurs.

For testing purposes on a Sun Fire 280R completely set up with ZFS I tried  
hotswaping a HDD.
There happens the same.

It is a big administrator's burden to get such a zfs drive back to live.

So how can I get my zpools back to life?

Regards,

Dave.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Casper . Dik


>However, I just want to state a warning, that ZFS is far from being that what 
>it
>is promising, and so far from my sum of experience I can't recommend at all to
>use zfs on a professional system.


Or, perhaps, you've given ZFS disks which are so broken that they are 
really unusable; it is USB, after all.

And certainly, on Solaris you'd get the same errors with UFS or PCFS; but 
you would not able to detect any corruption.

You may have seen Al's post about moving a spinning 1TB hard disk.

Before we can judge what goes wrong, we would need a bit more information
such as:

- motherboard and the USB controller
- the USB enclosure which holds the disk(s)
- the type of the disks themselves.
- any messages recorded in /var/adm/messages (for the time you used
  the database)

- and how did you remove the disks from the system?

Unfortunately, you cannot be sure that when the USB enclosure says that 
all the data is safe, it is actually written to the disk.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Tomas Ögren
On 09 February, 2009 - D. Eckert sent me these 1,5K bytes:

> Hi,
> 
> after working for 1 month with ZFS on 2 external USB drives I have 
> experienced, that the all new zfs filesystem is the most unreliable FS I have 
> ever seen.
> 
> Since working with the zfs, I have lost datas from:
> 
> 1 80 GB external Drive
> 1 1 Terrabyte external Drive
> 
> It is a shame, that zfs has no filesystem management tools for repairing e. 
> g. being able to repair those errors:
> 
>NAMESTATE READ WRITE CKSUM
> usbhdd1 ONLINE   0 0 8
>   c3t0d0s0  ONLINE   0 0 8
> 
> errors: Permanent errors have been detected in the following files:
> 
> usbhdd1: 0x0
> 
> 
> It is indeed very disappointing that moving USB zpools between
> computers ends in 90 % with a massive loss of data.
> 
> This is to the not reliable working command zfs umount ,
> even if the output of mount shows you, that the pool is no longer
> mounted and ist removed from mntab.

You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the pool is still active.. 'zpool export'
releases the pool from the OS, then 'zpool import' on the other machine.

> It works only 1 or 2 times, but removing the device back to the other
> machine, the pool won't be either recognized at all or the error
> mentioned above occurs.
> 
> Or suddenly you'll find that message inside your messages: "Fault
> tolerance of the pool may be compromised."
> 
> However, I just want to state a warning, that ZFS is far from being
> that what it is promising, and so far from my sum of experience I
> can't recommend at all to use zfs on a professional system.

You're basically yanking disks from a live filesystem, if you don't do
that, filesystems are happier.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread D. Eckert
Hi,

after working for 1 month with ZFS on 2 external USB drives I have experienced, 
that the all new zfs filesystem is the most unreliable FS I have ever seen.

Since working with the zfs, I have lost datas from:

1 80 GB external Drive
1 1 Terrabyte external Drive

It is a shame, that zfs has no filesystem management tools for repairing e. g. 
being able to repair those errors:

   NAMESTATE READ WRITE CKSUM
usbhdd1 ONLINE   0 0 8
  c3t0d0s0  ONLINE   0 0 8

errors: Permanent errors have been detected in the following files:

usbhdd1: 0x0


It is indeed very disappointing that moving USB zpools between computers ends 
in 90 % with a massive loss of data.

This is to the not reliable working command zfs umount , even if the 
output of mount shows you, that the pool is no longer mounted and ist removed 
from mntab.

It works only 1 or 2 times, but removing the device back to the other machine, 
the pool won't be either recognized at all or the error mentioned above occurs.

Or suddenly you'll find that message inside your messages: "Fault tolerance of 
the pool may be compromised."

However, I just want to state a warning, that ZFS is far from being that what 
it is promising, and so far from my sum of experience I can't recommend at all 
to use zfs on a professional system.

Regards,

Dave.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss