Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ian Collins

Allen Eastwood wrote:

On Jan 19, 2010, at 22:54 , Ian Collins wrote:

  

Allen Eastwood wrote:


On Jan 19, 2010, at 18:48 , Richard Elling wrote:

 
  

Many people use send/recv or AVS for disaster recovery on the inexpensive
side. Obviously, enterprise backup systems also provide DR capabilities.
Since ZFS has snapshots that actually work, and you can use send/receive
or other backup solutions on snapshots, I assert the problem is low priority.

   

What I have issue with is the idea that no one uses/should use tape any more.  There are places for tape and it still has value as a backup device.  In many cases in the past, ufsdump, despite it's many issues, was able to restore working OS's, or individual files.  Perfect, not by a long shot.  But it did get the job done. 
  
As was pointed out earlier, all I needed was a Solaris CD (or network boot) and I could restore.  Entire OS gone, boot and ufsrestore.  Critical files deleted, same thing…and I can restore just the file(s) I need.  And while it's been a few years since I've read the man page on ufsdump, ufsrestore and fssnap, those tools have proven useful when dealing with a downed system.
 
  

For a full recovery, you can archive a send stream and receive it back.  With 
ZFS snapshots, the need for individual file recovery from tape is much reduced. 
 The backup server I manage for a large client has 60 days of snaps and I can't 
remember when they had to go to tape to recover a file.

--
Ian.




Let's see…

For full recovery, I have to zfs send to something, preferably that understands tape (yes, I know I can send to tape directly, but how well does zfs send handle the end of the tape? auto-changers?).  
I keep a stream (as a file) of my root pool on a USB stick.  It could be 
on tape, but root pools are small.



Then for individual file recovery, I have snaphots…which I also have to get on 
to tape…if I want to have them available on something other than the boot 
devices.

  
No, just keep the snapshots in place.  If a file is lost, just gab it 
form the snapshot directory.  If the root filesystem is munted, roll 
back to the last snapshot.



Now…to recover the entire OS, perhaps not so bad…but that's one tool.  And to 
recover the one file, say a messed up /etc/system, that's preventing my OS from 
booting?  Have to get that snapshot where I can use it first…oh and restoring 
individual files and not the entire snapshot?

  
As I said, roll back.  Boot form install media, import the root pool, 
get the file from a snapshot, or roll back to the last good snapshot, 
export and reboot.



At best, it's an unwieldy process.  But does it offer the simplicity that 
ufsdump/ufsrestore (or dump/restore on how many Unix variants…) did?  No way.

  
It certainly does for file recovery.  Do you run incremental dumps every 
hour, or every 15 minutes?  Periodic snapshots are quick and cheep.  As 
I said before, careful use of snapshots all be removes the need to 
recover files from tape.  We have 60 days of 4 hourly and 24 hourly 
snapshots in place, so the odds on finding a recent copy of a lost file 
are way better than they would be on daily incrementals.  I certainly 
don't miss the pain of loading a sequence of incrementals to recover 
lost data.


So ZFS solves the problems in a different way.  For fat-finger recovery, 
it's way better than ufsdump/ufsrestore.

A simple, effective dump/restore that deals with all the supported file 
systems, can deal with tape or disk, and allows for complete OS restore or 
individual file restore, and can be run from a install CD/DVD.  As much as I 
love ZFS and as many problems as it does solve, leaving this out was a mistake, 
IMO.
  

It possibly was, but it has encouraged us to find better solutions.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ragnar Sundblad

On 19 jan 2010, at 20.11, Ian Collins wrote:

 Julian Regel wrote:
 
 Based on what I've seen in other comments, you might be right. 
 Unfortunately, I don't feel comfortable backing up ZFS filesystems because 
 the tools aren't there to do it (built into the operating system or using 
 Zmanda/Amanda).
 
 Commercial backup solutions are available for ZFS.
 I know tape backup isn't sexy, but it's a reality for many of us and it's 
 not going away anytime soon.
 
 True, but I wonder how viable its future is.  One of my clients requires 17 
 LT04 types for a full backup, which cost more and takes up more space than 
 the equivalent in removable hard drives.
 
 In the past few years growth in hard drive capacities has outstripped tapes 
 to the extent that removable hard drives and ZFS snapshots have become a more 
 cost effective and convenient backup media.

LTO media is still cheaper than equivalent sized disks, maybe a factor 5 or so. 
LTO drivers cost a little, but so do disk shelves. So, now that there is no big 
price issue, there is choice instead. Use it!

Hard drives are good for random access - both restore of individual files and 
partial rewrite. 

Hard drivers aren't faster than tape for data transfer, but they might be 
cheaper to run in parallel and therefore you could potentially gain speed. Hard 
drives have shorter seek time, which may be important.

Hard drives are probably bad for storing for longer times - especially - you 
will never know how long it could be stored before it will fail. A month? 
Probably. A year? Maybe. Five years? Well... Ten years? Probably not. LTO tapes 
are supposed to be able to keep it's data for at least 30 years of stored 
properly. Hard drives are probably best when used online or at least very often.

So - it is wrong to say that one is better or cheaper than the other. They have 
different properties, and could be used to solve different problems.

/ragge s

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Richard Elling richard.ell...@gmail.com wrote:

  
  ufsdump/restore was perfect in that regard.  The lack of equivalent 
  functionality is a big problem for the situations where this functionality 
  is a business requirement.

 How quickly we forget ufsdump's limitations :-).  For example, it is not 
 supported
 for use on an active file system (known data corruption possibility) and 
 UFS snapshots are, well, a poor hack and often not usable for backups.
 As the ufsdump(1m) manpage says,

It seems you forgot that zfs also needs snapshots. There is nothing bad with 
snapshots.

When I was talking with Jeff Bonwick in September 2004 (before ZFS became 
public), the only feature that was missing in Solaris for a 100% correct
backup based on star was an interface for holey files, so we designed it.

I believe the only mistake from ufsdumps is that is does not use standard
OS interfaces and that it does not use a standard archive format. You get
both with star and star is even faster than ufsdump.


Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Lutz Schumann
Hello, 

we tested clustering with ZFS and the setup looks like this: 

- 2 head nodes (nodea, nodeb)
- head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)
- two external jbods
- two mirror zpools (pool1,pool2)
   - each mirror is a mirror of one disk from each jbod
- no ZIL (anyone knows a well priced SAS SSD ?)

We want active/active and added the l2arc to the pools. 

- pool1 has nodea_l2arc as cache
- pool2 has nodeb_l2arc as cache

Everything is great so far. 

One thing to node is that the nodea_l2arc and nodea_l2arc are named equally ! 
(c0t2d0 on both nodes).

What we found is that during tests, the pool just picked up the device 
nodeb_l2arc automatically, altought is was never explicitly added to the pool 
pool1.

We had a setup stage when pool1 was configured on nodea with nodea_l2arc and 
pool2 was configured on nodeb without a l2arc. Then we did a failover. Then 
pool1 pickup up the (until then) unconfigured nodeb_l2arc. 

Is this intended ? Why is a L2ARC device automatically picked up if the device 
name is the same ? 

In a later stage we had both pools configured with the corresponding l2arc 
device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). Then 
we also did a failover. The l2arc device of the pool failing over was marked as 
too many corruptions instead of missing. 

So from this tests it looks like ZFS just picks up the device with the same 
name and replaces the l2arc without looking at the device signatures to only 
consider devices beeing part of a pool.

We have not tested with a data disk as c0t2d0 but if the same behaviour 
occurs - god save us all.

Can someone clarify the logic behind this ? 

Can also someone give a hint how to rename SAS disk devices in opensolaris ? 
(to workaround I would like to rename c0t2d0 on nodea (nodea_l2arc) to c0t24d0 
and c0t2d0 on nodeb (nodea_l2arc) to c0t48d0). 

P.s. Release is build 104 (NexentaCore 2). 

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Ian Collins i...@ianshome.com wrote:

  The correct way to archivbe ACLs would be to put them into extended POSIX 
  tar
  attrubutes as star does.
 
  See http://cdrecord.berlios.de/private/man/star/star.4.html for a format 
  documentation or have a look at ftp://ftp.berlios.de/pub/star/alpha, e.g.
  ftp://ftp.berlios.de/pub/star/alpha/acl-test.tar.gz
 
  The ACL format used by Sun is undocumented..
 

 man acltotext

We are talking about TAR and I did give a pointer to the star archive format 
documentation, so it is obvious that I was talking about the ACL format from
Sun tar. This format is not documented.



Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Edward Ned Harvey sola...@nedharvey.com wrote:

  Star implements this in a very effective way (by using libfind) that is
  even
  faster that the find(1) implementation from Sun.

 Even if I just find my filesystem, it will run for 7 hours.  But zfs can
 create my whole incremental snapshot in a minute or two.  There is no way
 star or any other user-space utility that walks the filesystem can come
 remotely close to this performance.  Such performance can only be
 implemented at the filesystem level, or lower.

You claim that it is fast for you but this is because it is block oriented and 
because you probably changed only few data. 

If you like to have a backup that allows to access files, you need a file based 
backup and I am sure that even a filesystem level scan for recently changed 
files will not be much faster than what you may achive with e.g. star.

Note that ufsdump directly accesees the raw disk device and thus _is_ at 
filesystem leven but still is slower than star on UFS.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror of SAN Boxes with ZFS ? (split site mirror)

2010-01-20 Thread Lutz Schumann
Actually I found some time (and reason) to test this. 

Environment: 
- 1 osol server 
- one SLES10 iSCSI Target
- two LUN's exported via iSCSi to the OSol server 

I did some rescilver tests to see how ZFS resilvers devices.

Prep:

osol: create a pool (myiscsi) with one mirror pair made from the two iSCSI 
backend disks of SLES10

Test: 

osol: both disks ok, 
osol: txn in ueberblock of pool = 86 
sles10: remove one disk (lun=1) 
osol: disk is detected failed, pool degraded
osol: write with oflag=direct; sync multiple times to the pool
osol: create fs myiscsi/test
osol: txn in ueberblock = 107 

osol: power off (hard) 
sles10: add lun 1 again (the one with txn 86)
sles10: remove lun 0 (the onw with txn 107)
osol: power on 

osol: txn in ueberblock = 92
osol: zfs myiscsi/test does not exist
osol: create fs myiscsi/mytest_old
osol: txn in ueberblock = 96

osol: power off (hard)
sles10: add lun 0 again (with txn 107)
sles10: both luns are there

osol: Resilvering happens automatically 

osol: txn in ueberblock = 112 
osol: filesystem myiscsi/test exists 

... same thing other way around to see if rescilver direction is persistent ...

osol: both disks ok, 
osol: txn in ueberblock = 120 
sles10: remove one disk (lun=0)
osol: write with oflag=sync; sync multiple times 
osol: create fs myiscsi/test
osol: txn in ueberblock = 142 

osol: power off (hard)
sles10: add lun 0 again (the one with txn 120)
sles10: remove lun 1 (the onw with txn 142)
osol: boot 

osol: txn in ueberblock = 127
osol: filesystem myiscsi/test does not exist
osol: create fs myiscsi/mytest_old
osol: txn in ueberblock = 133
osol: power off

sles10: add lun 1 again (with txn 142)
sles10: both luns are there

osol: boot 

osol: Resilvering happens automatically 

osol: txn in ueberblock = 148
osol: filesystem myiscsi/test exists 

---

From this tests it seems that the latest txn always wins. 

This practially means that the jbod with most changes (in terms of 
transacitons) will always sync over the one with the least modifications. 

Could someone confirm this assumtion ? 

Could someone explain resilvering direction selection ? 

Regards, 
Robert 


p.s. I did not test split brain, but this is next. (The planned setup is 
clustered not iSCSI but SAS, so the split brain is more academic in this case).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Julian Regel
While I can appreciate that ZFS snapshots are very useful in being able to 
recover files that users might have deleted, they do not do much to help when 
the entire disk array experiences a crash/corruption or catches fire. Backing 
up to a second array helps if a) the array is off-site and for many of us the 
cost of remote links with sufficient bandwidth is still prohibitive,or b) on 
the local network but sufficiently far away from the original array such that 
the fire does not corrupt damage the backup as well.

This leaves some form of removable storage. I'm not sure I'm aware of any 
enterprise-level removable disk solution, primarily because disk isn't really 
designed to be used for offsite backup whereas tape is.

The biggest problem with tape was finding a sufficiently large window in which 
to perform the backup. ZFS snapshots completely solves this issue, but Sun have 
failed to provide the mechanism to protect the data off-site.



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HT

2010-01-20 Thread Simon Breden
Yes, this model looks to be interesting.

SuperMicro seem to have produced two models new models that satisfy the SATA 
III requirements of 6Gbps per channel:

1. AOC-USAS2-L8e: 
http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=E
2. AOC-USAS2-L8i: 
http://www.supermicro.com/products/accessories/addon/AOC-USAS2-L8i.cfm?TYP=I

The main difference appears to be that the L8i model has RAID capabilities, 
whereas the L8e model does not.

As ZFS does its own RAID calculations in software it needs JBOD, and doesn't 
need the adapter to have RAID capabilities, so the AOC-USAS2-L8e model looks to 
be ideal. If we're lucky maybe it's also a little cheaper too.

Sorry I can't help you with your questions though. Hopefully someone else will 
be able to help. I will also be interested to hear any further info on this 
card.

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Julian Regel
 If you like to have a backup that allows to access files, you need a file 
 based 

 backup and I am sure that even a filesystem level scan for recently changed 
 files will not be much faster than what you may achive with e.g. star.

 Note that ufsdump directly accesees the raw disk device and thus _is_ at 
 filesystem leven but still is slower than star on UFS.

While I am sure that star is technically a fine utility, the problem is that it 
is effectively an unsupported product.

If our customers find a bug in their backup that is caused by a failure in a 
Sun supplied utility, then they have a legal course of action. The customer's 
system administrators are covered because they were using tools provided by the 
vendor. The wrath of the customer would be upon Sun, not the supplier (us) or 
the supplier's technical lead (me).

If the system administrator has chosen star (or if the supplier recommends 
star), then the conversation becomes a lot more awkward. From the perspective 
of the business, the system administrator will have acted irresponsibly by 
choosing a tool that has no vendor support. Alternatively, the supplier will be 
held responsible for recommending a product that has broken the customer's 
ability to restore, and with no legal recourse, I wouldn't dare touch it. Sorry.

This is why Sun need to provide the solution themselves (or adopt and provide 
support for star or similar third party products).

JR



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-20 Thread Simon Breden
I see also that Samsung have very recently released the HD203WI 2TB 4-platter 
model.

It seems to have good customer ratings so far at newegg.com, but currently 
there are only 13 reviews so it's a bit early to tell if it's reliable.

Has anyone tried this model with ZFS?

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Julian Regel jrmailgate-zfsdisc...@yahoo.co.uk wrote:

  If you like to have a backup that allows to access files, you need a file 
  based 

  backup and I am sure that even a filesystem level scan for recently changed 
  files will not be much faster than what you may achive with e.g. star.
 
  Note that ufsdump directly accesees the raw disk device and thus _is_ at 
  filesystem leven but still is slower than star on UFS.

 While I am sure that star is technically a fine utility, the problem is that 
 it is effectively an unsupported product.

From this viewpoint, you may call most of Solaris unsupported.

 If our customers find a bug in their backup that is caused by a failure in a 
 Sun supplied utility, then they have a legal course of action. The customer's 
 system administrators are covered because they were using tools provided by 
 the vendor. The wrath of the customer would be upon Sun, not the supplier 
 (us) or the supplier's technical lead (me).

Do you really believe that Sun will help such a customer?
There are many bugs in Solaris (I remember e.g. some showstopper
bugs in the multimedia area) that are not fixed although they are known
since a very long time (more than a year).

There is a bug in ACL handling in Sun's tar (reported by me in 2004 or even 
before) that is not fixed. As a result in many cases ACLs are not restored.

Note that bugs in star are fixed much faster and looking back at the 28 years
of history with star, I know of not a single bug that took more than 3 months
to get a fix. Typically, bugs are fixed withing less than a week - many bugs
even within a few hours. This is a support quality that Sun does not offer.

So please explain us where you see a problem with star..

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Julian Regel
 While I am sure that star is technically a fine utility, the problem is that 
 it is effectively an unsupported product.

From this viewpoint, you may call most of Solaris unsupported.

From the perspective of the business, the contract with Sun provides that 
support.

 If our customers find a bug in their backup that is caused by a failure in a 
 Sun supplied utility, then they have a legal course of action. The 
 customer's system administrators are covered because they were using tools 
 provided by the vendor. The wrath of the customer would be upon Sun, not 
 the supplier (us) or the supplier's technical lead (me).

Do you really believe that Sun will help such a customer?
There are many bugs in Solaris (I remember e.g. some showstopper
bugs in the multimedia area) that are not fixed although they are known
since a very long time (more than a year).
There is a bug in ACL handling in Sun's tar (reported by me in 2004 or even 
before) that is not fixed. As a result in many cases ACLs are not restored.

If Sun don't fix a critical bug that is affecting the availability of
server that is under support, then it becomes a problem for the legal
department. In the ACL example, it's possible the effected users didn't have a 
support contract.

Note that bugs in star are fixed much faster and looking back at the 28 years
of history with star, I know of not a single bug that took more than 3 months
to get a fix. Typically, bugs are fixed withing less than a week - many bugs
even within a few hours. This is a support quality that Sun does not offer.

Possibly, but there is no guarantee that it will be fixed, no-one to call when 
there is a problem, no-one to escalate the problem to if it is ignored, and no 
company to sue if it all goes wrong.

So please explain us where you see a problem with star..

Hopefully my above comments explain sufficiently. It's not a technical issue 
with star, it's a business issue. The rules there are very different and not 
based on merit (this is also why many companies prefer running their mission 
critical apps on Red Hat Enterprise Linux instead of CentOS, even though 
technically they are almost identical).

JR



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Julian Regel jrmailgate-zfsdisc...@yahoo.co.uk wrote:

  While I am sure that star is technically a fine utility, the problem is 
  that it is effectively an unsupported product.

 From this viewpoint, you may call most of Solaris unsupported.

 From the perspective of the business, the contract with Sun provides that 
 support.

From a perspective of reality, such a contract will not help.

 Do you really believe that Sun will help such a customer?
 There are many bugs in Solaris (I remember e.g. some showstopper
 bugs in the multimedia area) that are not fixed although they are known
 since a very long time (more than a year).
 There is a bug in ACL handling in Sun's tar (reported by me in 2004 or even 
 before) that is not fixed. As a result in many cases ACLs are not restored.

 If Sun don't fix a critical bug that is affecting the availability of
 server that is under support, then it becomes a problem for the legal
 department. In the ACL example, it's possible the effected users didn't have 
 a support contract.

What you seem to point out is that in case of a problem for a customer with a 
contract, the legal department gets involved. Unfortunately, laywers do not fix 
bugs.

 Note that bugs in star are fixed much faster and looking back at the 28 years
 of history with star, I know of not a single bug that took more than 3 months
 to get a fix. Typically, bugs are fixed withing less than a week - many bugs
 even within a few hours. This is a support quality that Sun does not offer.

 Possibly, but there is no guarantee that it will be fixed, no-one to call 
 when there is a problem, no-one to escalate the problem to if it is ignored, 
 and no company to sue if it all goes wrong.

Escalating a problem does not fix it. 

 So please explain us where you see a problem with star..

 Hopefully my above comments explain sufficiently. It's not a technical issue 
 with star, it's a business issue. The rules there are very different and not 
 based on merit (this is also why many companies prefer running their mission 
 critical apps on Red Hat Enterprise Linux instead of CentOS, even though 
 technically they are almost identical).

Now we are back to reality.

A person that is interested in a solution will usually check what happened
in similar cases before. If you compare star with Sun supplied tools with
this background, Sun cannot outperform star.

Red Hat Enterprise Linux may offer something you cannot get with CentOS.
But I don't see that Sun can offer something you don't get with star.

Let me make another reality check:

Many people use GNU tar for backup purposes, but my first automated test case
with incremental backups using GNU tar did fail so miserably that I was unable
to use GNU tar as a test reference at all.

On the other side, I am doing incremental backup _and_ restore tests with 
gigabytes of real delta data on a dayly base since 2004 and I did hot see any 
problem since April 2005.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS default compression and file size limit?

2010-01-20 Thread Wajih Ahmed

I have a 13GB text file.  I turned ZFS compression on with zfs set
compression=on mypool.  When i copy the 13GB file into another file, it
does not get compressed (checking via du -sh).  However if i set
compression=gzip, then the file gets compressed.

Is there a limit on file size with the default compression algorithm?  I
did experiment with a much smaller file of 0.5GB with the default
compression and it did get compressed.

I am using S10 U8 x86/64.

Regards,


--
Wajih Ahmed
Principal Field Technologist
877.274.6589 / x40572
Skype: wajih_ahmed



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-20 Thread Constantin Gonzalez

Hi,

I'm using 2 x 1.5 TB drives from Samsung (EcoGreen, I believe) in my current
home server. One reported 14 Read errors a few weeks ago, roughly 6 months after
install, which went away during the next scrub/resilver.

This remembered me to order a 3rd drive, a 2.0 TB WD20EADS from Western Digital
and I now have a 3-way mirror, which is effectively a 2-way mirror with its
hot-spare already synced in.

The idea behind notching up the capacity is threefold:

- No sorry, this disk happens to have 1 block too few problems on attach.

- When the 1.5 TB disks _really_ break, I'll just order another 2 TB one and
  use the opportunity to upgrade pool capacity. Since at least one of the 1.5TB
  drives will still be attached, there won't be any slightly smaller drive
  problems either when attaching the second 2TB drive.

- After building in 2 bigger drives, it becomes easy to figure out which of the
  drives to phase out. Just go for the smaller drives. This solves the headache
  of trying to figure out the right drive to build out when you replace drives
  that aren't hot spares and don't have blinking lights.

Frankly, I don't care whether the Samsung or the WD drives are better or worse,
they're both consumer drives and they're both dirt cheap. Just assume that
they'll break soon (since you're probably using them more intensely than their
designed purpose) and make sure their replacements are already there.

It also helps mixing vendors, so one glitch that affect multiple disks in the
same batch won't affect your setup too much. (And yes, I broke that rule with
my initial 2 Samsung drives but I'm now glad I have both vendors :)).

Hope this helps,
   Constantin


Simon Breden wrote:

I see also that Samsung have very recently released the HD203WI 2TB 4-platter 
model.

It seems to have good customer ratings so far at newegg.com, but currently 
there are only 13 reviews so it's a bit early to tell if it's reliable.

Has anyone tried this model with ZFS?

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Michael Schuster

Joerg Schilling wrote:

Julian Regel jrmailgate-zfsdisc...@yahoo.co.uk wrote:

If you like to have a backup that allows to access files, you need a file based 
backup and I am sure that even a filesystem level scan for recently changed 
files will not be much faster than what you may achive with e.g. star.


Note that ufsdump directly accesees the raw disk device and thus _is_ at 
filesystem leven but still is slower than star on UFS.

While I am sure that star is technically a fine utility, the problem is that it 
is effectively an unsupported product.


From this viewpoint, you may call most of Solaris unsupported.


what is that supposed to mean?

Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-20 Thread Simon Breden
Hi Constantin,

It's good to hear your setup with the Samsung drives is working well. Which 
model/revision are they?

My personal preference is to use drives of the same model  revision.

However, in order to help ensure that the drives will perform reliably, I 
prefer to do a fair amount of research first, in order to find drives that are 
reported by many users to be working reliably in their systems. I did this for 
my current WD7500AAKS drives and have never seen even one read/write or 
checksum error in 2 years - they have worked flawlessly.

As a crude method of checking reliability of any particular drive, I take a 
look at newegg.com and see the percentage of users rating the drives with 4 or 
5 stars, and read the problems listed to see what kind of problems the drives 
may have.

If you read the WDC links I list in the first post above, there does appear to 
be some problem that many users are experiencing with the most recent revisions 
of the WD Green 'EADS' drives and also the new Green models in the 'EARS' 
range. I don't know the cause of the problem though.

I did wonder if the problems people are experiencing might be caused by 
spindown/power-saving features of the drives, which might cause a long delay 
before data is accessible again after spin-up, but this is just a guess.

For now, I am looking at the 1.5TB Samsung HD154UI (revision 1AG01118 ?), or 
possibly the 2TB Samsung HD203WI when more user ratings are available.

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Robert Milkowski

On 19/01/2010 19:11, Ian Collins wrote:

Julian Regel wrote:


Based on what I've seen in other comments, you might be right. 
Unfortunately, I don't feel comfortable backing up ZFS filesystems 
because the tools aren't there to do it (built into the operating 
system or using Zmanda/Amanda).



Commercial backup solutions are available for ZFS.
I know tape backup isn't sexy, but it's a reality for many of us and 
it's not going away anytime soon.


True, but I wonder how viable its future is.  One of my clients 
requires 17 LT04 types for a full backup, which cost more and takes up 
more space than the equivalent in removable hard drives.


In the past few years growth in hard drive capacities has outstripped 
tapes to the extent that removable hard drives and ZFS snapshots have 
become a more cost effective and convenient backup media.


What do people with many tens of TB use for backup these days?


http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Robert Milkowski

On 20/01/2010 10:48, Ragnar Sundblad wrote:

On 19 jan 2010, at 20.11, Ian Collins wrote:

   

Julian Regel wrote:
 

Based on what I've seen in other comments, you might be right. Unfortunately, I 
don't feel comfortable backing up ZFS filesystems because the tools aren't 
there to do it (built into the operating system or using Zmanda/Amanda).

   

Commercial backup solutions are available for ZFS.
 

I know tape backup isn't sexy, but it's a reality for many of us and it's not 
going away anytime soon.

   

True, but I wonder how viable its future is.  One of my clients requires 17 
LT04 types for a full backup, which cost more and takes up more space than the 
equivalent in removable hard drives.

In the past few years growth in hard drive capacities has outstripped tapes to 
the extent that removable hard drives and ZFS snapshots have become a more cost 
effective and convenient backup media.
 

LTO media is still cheaper than equivalent sized disks, maybe a factor 5 or so. 
LTO drivers cost a little, but so do disk shelves. So, now that there is no big 
price issue, there is choice instead. Use it!

Hard drives are good for random access - both restore of individual files and 
partial rewrite.

Hard drivers aren't faster than tape for data transfer, but they might be 
cheaper to run in parallel and therefore you could potentially gain speed. Hard 
drives have shorter seek time, which may be important.

Hard drives are probably bad for storing for longer times - especially - you 
will never know how long it could be stored before it will fail. A month? 
Probably. A year? Maybe. Five years? Well... Ten years? Probably not. LTO tapes 
are supposed to be able to keep it's data for at least 30 years of stored 
properly. Hard drives are probably best when used online or at least very often.

So - it is wrong to say that one is better or cheaper than the other. They have 
different properties, and could be used to solve different problems.


   


It is actually not that easy.

Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.

Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare 
+ 2x OS disks.
The four raidz2 group form a single pool. This would provide well over 
30TB of logical storage per each box.


Now you rsync all the data from your clients to a dedicated filesystem 
per client, then create a snapshot.
All snapshots are replicated to a 2nd x4540 so even if you would loose 
entire box/data for some reason you would still have a spare copy.


Now compare it to a cost of a library, lto drives, tapes, software + 
licenses, support costs, ...


See more details at 
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS default compression and file size limit?

2010-01-20 Thread Robert Milkowski

On 20/01/2010 13:39, Wajih Ahmed wrote:

I have a 13GB text file.  I turned ZFS compression on with zfs set
compression=on mypool.  When i copy the 13GB file into another file, it
does not get compressed (checking via du -sh).  However if i set
compression=gzip, then the file gets compressed.

Is there a limit on file size with the default compression algorithm?  I
did experiment with a much smaller file of 0.5GB with the default
compression and it did get compressed.



if a given block is not gaining more than 12.5% from a compression then 
it will not be stored as compressed.
It might be that with a default compression algorithm (lzjb) you are 
gaining less than 12.5% while when using gzip you are getting more 
therefore blocks end up being compressed.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread David Dyer-Bennet

On Wed, January 20, 2010 09:23, Robert Milkowski wrote:

 Now you rsync all the data from your clients to a dedicated filesystem
 per client, then create a snapshot.


Is there an rsync out there that can reliably replicate all file
characteristics between two ZFS/Solaris systems?  I haven't found one. 
The ZFS ACLs seem to be beyond all of them, in particular.

(Losing just that, and preserving the data, is clearly far, far better
than losing everything!  And a system build *knowing* it was losing the
protections could preserve them some other way.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread David Dyer-Bennet

On Wed, January 20, 2010 04:48, Ragnar Sundblad wrote:

 LTO media is still cheaper than equivalent sized disks, maybe a factor 5
 or so. LTO drivers cost a little, but so do disk shelves. So, now that
 there is no big price issue, there is choice instead. Use it!

Depends on the scale you're operating at.

Backing up my 800GB home data pool onto a couple of external 1TB USB
drives is *immensely* cheaper than buying tape equipment.

At enough bigger scales, I accept that tape is still cheaper.  Makes
sense, since the tapes are relatively simple compared to drives, and you
only need a small number of drives to use a large number of tapes.

I think hard drives are still cheaper at small-enterprise levels, actually.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unavailable device

2010-01-20 Thread Cindy Swearingen

Hi John,

In general, ZFS will warn you when you attempt to add a device that
is already part of an existing pool. One exception is when the system
is being re-installed.

I'd like to see the set of steps that led to the notification failure.

Thanks,

Cindy

On 01/19/10 20:58, John wrote:

I was able to solve it, but it actually worried me more than anything.

Basically, I had created the second pool using the mirror as a primary device. 
So three disks but two full disk root mirrors.

Shouldn't zpool have detected an active pool and prevented this? The other LDOM 
was claiming a corrupted device, which I was able to replace and clear easily. 
But the one pool I originally posted about looks to be permanently gone, since 
it believes there is another device, but doesn't know where the device is or 
what it was ever called. If I could import it and re-do the mirror somehow, or 
something similar, it'd be great. Is there anyway to force it to realize it's 
wrong?

Obviously, I should've kept better track of the WWN's - But I've made the 
mistake before and zpool always prevented it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Bob Friesenhahn

On Wed, 20 Jan 2010, Julian Regel wrote:


If our customers find a bug in their backup that is caused by a 
failure in a Sun supplied utility, then they have a legal course of 
action. The customer's system administrators are covered because 
they were using tools provided by the vendor. The wrath of the 
customer would be upon Sun, not the supplier (us) or the supplier's 
technical lead (me).


I would love to try whatever you are smoking because it must be really 
good stuff.  It would be a bold new step for me, but the benefits are 
clear.


While your notions of the transitive protection offered by vendor 
support are interesting, I will be glad to meet you in the 
unemployment line then we can share some coffee and discuss the good 
old days.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Julian Regel
It is actually not that easy.

Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.

Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare 
+ 2x OS disks.
The four raidz2 group form a single pool. This would provide well over 
30TB of logical storage per each box.

Now you rsync all the data from your clients to a dedicated filesystem 
per client, then create a snapshot.
All snapshots are replicated to a 2nd x4540 so even if you would loose 
entire box/data for some reason you would still have a spare copy.

Now compare it to a cost of a library, lto drives, tapes, software + 
licenses, support costs, ...

See more details at 
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

I've just read your presentation Robert. Interesting stuff.

I've also just done a pen and paper exercise to see how much 30TB of tape would 
cost as a comparison to your disk based solution.

Using list prices from Sun's website (and who pays list..?), an SL48 with 2 x 
LTO3 drives would cost £14000. I couldn't see a price on an LTO4 equipped SL48 
despite the Sun website saying it's a supported option. Each LTO3 has a native 
capacity of 300GB and the SL48 can hold up to 48 tapes in the library (14.4TB 
native per library). To match the 30TB in your solution, we'd need two 
libraries totalling £28000.

You would also need 100 LTO3 tapes to provide 30TB of native storage. I 
recently bought a pack of 20 tapes for £340, so five packs would be £1700.


So you could provision a tape backup for just under £3 (~$49000). In 
comparison, the cost of one X4540 with ~ 36TB usable storage is UK list price 
£30900. I've not factored in backup software since you could use an open source 
solution such as Amanda or Bacula.

Which isn't to say tape would be a better solution since it's going to be 
slower to restore etc. But it does show that tape can work out cheaper, 
especially since the cost of a high speed WAN link isn't required.

JR



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS default compression and file size limit?

2010-01-20 Thread Wajih Ahmed

I have a 13GB text file.  I turned ZFS compression on with zfs set
compression=on mypool.  When i copy the 13GB file into another file, it
does not get compressed (checking via du -sh).  However if i set
compression=gzip, then the file gets compressed.

Is there a limit on file size with the default compression algorithm?  I
did experiment with a much smaller file of 0.5GB with the default
compression and it did get compressed.

I am using S10 U8.

Regards,


--
Wajih Ahmed
Principal Field Technologist
877.274.6589 / x40572
Skype: wajih_ahmed


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Miles Nordin
 ae == Allen Eastwood mi...@paconet.us writes:
 ic == Ian Collins i...@ianshome.com writes:

  If people are really still backing up to tapes or DVD's, just
  use file vdev's, export the pool, and then copy the unmounted
  vdev onto the tape or DVD.

ae And some of those enterprises require backup mechanism that
ae can be easily used in a DR situation.

ae ufsdump/restore was perfect in that regard.  The lack of
ae equivalent functionality is a big problem for the situations
ae where this functionality is a business requirement.

ae For example, one customer, local government, requires a backup
ae that can be taken offsite and used in a DR situation.

Were you confused by some part of:

 Use file vdevs, epxort the pool, and then copy the unmounted vdev
 onto the tape.

or do you find that this doesn't do what you want?  because it seems
fine to me.  And the fact that it doesn't need any extra tools means
it's unlikely to break (1) far into the future or (2) for a few
unlucky builds, and (3) that the restore environment is simple and
doesn't involve prepopulating some Legato Database with the TOC of
every tape in the library or some such nonsense, which ought to all be
among your ``requirements'' but if you're substituting for those,
``works in exactly the way we were used to it working before'' then
you may as well use 'zfs send' since you're more concerned with
identical-feeling invocatoin syntax than the problems I mentioned.

ic For a full recovery, you can archive a send stream and receive
ic it back.

You can send the stream to the tape, transport the tape to the DR
site, and receive it.  You can do this weekly as part of your offsite
backup plan provided that you receive each tape you transport
immediately.  Then the data should be permanently stored on disk at
the DR site, and the tapes used only for transport.

If you store the backup permanently on tape then it's a step backwards
from tar/cpio/ufsrestore because the 'zfs send' format is more fragile
and has to be restored entire.  If you receive the tape immediately
this is an improvement because under the old convention tapes could be
damaged in transit, or over the years by heat/dust/sunlight, without
your knowledge, while on disks it's simple to scrub periodically.

I am not trying to take away your tapes, Allen, so please quote Ian
instead if that's the thing you object to.  I've instead suggested a
different way to use them if you really do need them archivally: store
file vdev's on them.  If you're just using them to replicate data to
the DR site then you needn't even go as far as my workaround.

I do agree that there's a missing tool: it's not possible to copy one
subdirectory to another while preserving holes, forkey extended
attributes, and ACL's.  Also if Windows ACL's are going to be stored
right in the filesystem, then Windows ACL's probably ought to be
preserved over an rsync pipe between Solaris and EnTee, or a
futuristic tarball written on one and extracted on the other.  I don't
agree that the missing tool is designed primarily for the narrow
use-case of writing to ancient backup tapes: it's a more general tool.
or, really, it's just a matter of documenting and committing the
extra-OOB-gunk APIs and then fixing rsync and GNUtar.


pgpk5wk2koJcZ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Robert Milkowski

On 20/01/2010 16:22, Julian Regel wrote:

It is actually not that easy.

Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.

Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare
+ 2x OS disks.
The four raidz2 group form a single pool. This would provide well over
30TB of logical storage per each box.

Now you rsync all the data from your clients to a dedicated filesystem
per client, then create a snapshot.
All snapshots are replicated to a 2nd x4540 so even if you would loose
entire box/data for some reason you would still have a spare copy.

Now compare it to a cost of a library, lto drives, tapes, software +
licenses, support costs, ...

See more details at
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

I've just read your presentation Robert. Interesting stuff.

I've also just done a pen and paper exercise to see how much 30TB of 
tape would cost as a comparison to your disk based solution.


Using list prices from Sun's website (and who pays list..?), an SL48 
with 2 x LTO3 drives would cost £14000. I couldn't see a price on an 
LTO4 equipped SL48 despite the Sun website saying it's a supported 
option. Each LTO3 has a native capacity of 300GB and the SL48 can hold 
up to 48 tapes in the library (14.4TB native per library). To match 
the 30TB in your solution, we'd need two libraries totalling £28000.


You would also need 100 LTO3 tapes to provide 30TB of native storage. 
I recently bought a pack of 20 tapes for £340, so five packs would be 
£1700.


So you could provision a tape backup for just under £3 (~$49000). 
In comparison, the cost of one X4540 with ~ 36TB usable storage is UK 
list price £30900. I've not factored in backup software since you 
could use an open source solution such as Amanda or Bacula.


Which isn't to say tape would be a better solution since it's going 
to be slower to restore etc. But it does show that tape can work out 
cheaper, especially since the cost of a high speed WAN link isn't 
required.


JR

You would also need to add at least one server to your library with fc 
cards.
Then with most software you would need more tapes due to data 
fragmentation and a need to do regular full backups (with zfs+rsync you 
only do a full backup once).


So in best case a library will cost about the same as disk based 
solution but generally will be less flexible, etc. If you would add any 
enterprise software on top of it (Legato, NetBackup, ...) then the price 
would change dramaticallly. Additionally with ZFS one could start using 
deduplication (in testing already).



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Robert Milkowski

On 20/01/2010 17:21, Robert Milkowski wrote:

On 20/01/2010 16:22, Julian Regel wrote:

It is actually not that easy.

Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.

Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot 
spare

+ 2x OS disks.
The four raidz2 group form a single pool. This would provide well over
30TB of logical storage per each box.

Now you rsync all the data from your clients to a dedicated filesystem
per client, then create a snapshot.
All snapshots are replicated to a 2nd x4540 so even if you would loose
entire box/data for some reason you would still have a spare copy.

Now compare it to a cost of a library, lto drives, tapes, software +
licenses, support costs, ...

See more details at
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

I've just read your presentation Robert. Interesting stuff.

I've also just done a pen and paper exercise to see how much 30TB of 
tape would cost as a comparison to your disk based solution.


Using list prices from Sun's website (and who pays list..?), an SL48 
with 2 x LTO3 drives would cost £14000. I couldn't see a price on an 
LTO4 equipped SL48 despite the Sun website saying it's a supported 
option. Each LTO3 has a native capacity of 300GB and the SL48 can 
hold up to 48 tapes in the library (14.4TB native per library). To 
match the 30TB in your solution, we'd need two libraries totalling 
£28000.


You would also need 100 LTO3 tapes to provide 30TB of native storage. 
I recently bought a pack of 20 tapes for £340, so five packs would be 
£1700.


So you could provision a tape backup for just under £3 (~$49000). 
In comparison, the cost of one X4540 with ~ 36TB usable storage is UK 
list price £30900. I've not factored in backup software since you 
could use an open source solution such as Amanda or Bacula.


Which isn't to say tape would be a better solution since it's going 
to be slower to restore etc. But it does show that tape can work out 
cheaper, especially since the cost of a high speed WAN link isn't 
required.


JR

You would also need to add at least one server to your library with fc 
cards.
Then with most software you would need more tapes due to data 
fragmentation and a need to do regular full backups (with zfs+rsync 
you only do a full backup once).


So in best case a library will cost about the same as disk based 
solution but generally will be less flexible, etc. If you would add 
any enterprise software on top of it (Legato, NetBackup, ...) then the 
price would change dramaticallly. Additionally with ZFS one could 
start using deduplication (in testing already).





What I really mean is that a disk based solution used to be much more 
expensive than tapes but currently they are comparable in costs while 
often the disk based is more flexible.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS default compression and file size limit?

2010-01-20 Thread Wajih Ahmed

Mike,

Thank you for your quick response...

Is there a way for me to test the compression from the command line to 
see if lzjb is giving me more or less than the 12.5% mark?  I guess it 
will depend if there is a lzjb command line utility.


I am just a little surprised because gzip-6 is able to compress it to 
4.4GB from 14GB (and gzip-1 4.8GB) and from what i read lzjb should be 
giving me better an 12.5% compression.  For example the *compress* 
command (which i think uses LZO, a slight different variant of 
Lempel-Ziv) manges to reduce it to 8.0GB.  That is a 57% ratio. 


Regards,

--
Wajih Ahmed
Principal Field Technologist
877.274.6589 / x40572
Skype: wajih_ahmed



Robert Milkowski wrote:

On 20/01/2010 13:39, Wajih Ahmed wrote:

I have a 13GB text file.  I turned ZFS compression on with zfs set
compression=on mypool.  When i copy the 13GB file into another file, it
does not get compressed (checking via du -sh).  However if i set
compression=gzip, then the file gets compressed.

Is there a limit on file size with the default compression algorithm?  I
did experiment with a much smaller file of 0.5GB with the default
compression and it did get compressed.



if a given block is not gaining more than 12.5% from a compression 
then it will not be stored as compressed.
It might be that with a default compression algorithm (lzjb) you are 
gaining less than 12.5% while when using gzip you are getting more 
therefore blocks end up being compressed.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unavailable device

2010-01-20 Thread Victor Latushkin

John wrote:

I was able to solve it, but it actually worried me more than anything.

Basically, I had created the second pool using the mirror as a primary device. 
So three disks but two full disk root mirrors.

Shouldn't zpool have detected an active pool and prevented this? The other LDOM 
was claiming a corrupted device, which I was able to replace and clear easily. 
But the one pool I originally posted about looks to be permanently gone, since 
it believes there is another device, but doesn't know where the device is or 
what it was ever called. If I could import it and re-do the mirror somehow, or 
something similar, it'd be great. Is there anyway to force it to realize it's 
wrong?


You can try limiting access to one device at a time by removing one device from 
LDOM configuration, or creating separate directory like /tmp/dsk and copying 
symlink for the device you want to try there and trying to do


zpool import(if device is removed at the LDOM level)
zpool import -d /tmp/dsk(in case you prefer trick with symlinks)

Posting label 0 (from zdb -l /dev/rdsk/... output) of both involved disks may 
provide more clues.


regards,
victor




Obviously, I should've kept better track of the WWN's - But I've made the 
mistake before and zpool always prevented it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Clearing a directory with more than 60 million files

2010-01-20 Thread Miles Nordin
 ml == Mikko Lammi mikko.la...@lmmz.net writes:

ml rm -rf to problematic directory from parent level. Running
ml this command shows directory size decreasing by 10,000
ml files/hour, but this would still mean close to ten months
ml (over 250 days) to delete everything!

interesting.

does 'zpool scrub' take unusually long, too?  or is it pretty close to
normal speed?


pgpzuvM8WeXmu.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL

2010-01-20 Thread Willy
To those concerned about this issue, there is a patched version of 
smartmontools that enables the querying and setting of TLER/ERC/CCTL values 
(well, except for recent desktop drives from Western Digitial).  It's available 
here http://www.csc.liv.ac.uk/~greg/projects/erc/

Unfortunately, smartmontools has limited SATA drive support in opensolaris, and 
you cannot query or set the values.  I'm looking into booting into linux, 
setting the values, and then rebooting into opensolaris since the settings will 
survive a warm reboot (but not a powercycle).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Richard Elling
On Jan 20, 2010, at 3:15 AM, Joerg Schilling wrote:

 Richard Elling richard.ell...@gmail.com wrote:
 
 
 ufsdump/restore was perfect in that regard.  The lack of equivalent 
 functionality is a big problem for the situations where this functionality 
 is a business requirement.
 
 How quickly we forget ufsdump's limitations :-).  For example, it is not 
 supported
 for use on an active file system (known data corruption possibility) and 
 UFS snapshots are, well, a poor hack and often not usable for backups.
 As the ufsdump(1m) manpage says,
 
 It seems you forgot that zfs also needs snapshots. There is nothing bad with 
 snapshots.

Yes, snapshots are a good thing. But most people who try fssnap 
on the UFS root file system will discover that it doesn't work; for 
reasons mentioned in the NOTES section of fssnap_ufs(1m). 
fssnap_ufs is simply a butt-ugly hack. So if you believe you can
reliably use ufsdump to store a DR copy of root for a 7x24x365 
production environment, then you probably believe the Backup
Fairy will leave a coin under your pillow when your restore fails :-)

Fortunately, ZFS snapshot do the right thing.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ian Collins

Julian Regel wrote:

It is actually not that easy.

Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.

Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare
+ 2x OS disks.
The four raidz2 group form a single pool. This would provide well over
30TB of logical storage per each box.

Now you rsync all the data from your clients to a dedicated filesystem
per client, then create a snapshot.
All snapshots are replicated to a 2nd x4540 so even if you would loose
entire box/data for some reason you would still have a spare copy.

Now compare it to a cost of a library, lto drives, tapes, software +
licenses, support costs, ...

See more details at
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

I've just read your presentation Robert. Interesting stuff.

I've also just done a pen and paper exercise to see how much 30TB of 
tape would cost as a comparison to your disk based solution.


Using list prices from Sun's website (and who pays list..?), an SL48 
with 2 x LTO3 drives would cost £14000. I couldn't see a price on an 
LTO4 equipped SL48 despite the Sun website saying it's a supported 
option. Each LTO3 has a native capacity of 300GB and the SL48 can hold 
up to 48 tapes in the library (14.4TB native per library). To match 
the 30TB in your solution, we'd need two libraries totalling £28000.


You would also need 100 LTO3 tapes to provide 30TB of native storage. 
I recently bought a pack of 20 tapes for £340, so five packs would be 
£1700.


So you could provision a tape backup for just under £3 (~$49000). 
In comparison, the cost of one X4540 with ~ 36TB usable storage is UK 
list price £30900. I've not factored in backup software since you 
could use an open source solution such as Amanda or Bacula.


A more apples to apples comparison would be to compare the storage 
only.  Both removable drive and tape options require a server with FC or 
SCSI ports, so that can be excluded from the comparison.


So for 30TB, assuming 2TB drives @ ~£100 with a pool built of 6 drive 
raidz vdevs 18 drives would be required plus 2 16 drive shelves.  So 
each backup set would cost about £1800.  So there's not a great deal of 
difference.  With drives you also get the added benefit of keeping all 
your incrementals (as snapshots) on the archive set.


HDD price per GB will continue to drop faster than tape, so it will be 
interesting to do the same comparison in 12 months.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ian Collins

Joerg Schilling wrote:

Ian Collins i...@ianshome.com wrote:

  

The correct way to archivbe ACLs would be to put them into extended POSIX tar
attrubutes as star does.

See http://cdrecord.berlios.de/private/man/star/star.4.html for a format 
documentation or have a look at ftp://ftp.berlios.de/pub/star/alpha, e.g.

ftp://ftp.berlios.de/pub/star/alpha/acl-test.tar.gz

The ACL format used by Sun is undocumented..

  
  

man acltotext



We are talking about TAR and I did give a pointer to the star archive format 
documentation, so it is obvious that I was talking about the ACL format from

Sun tar. This format is not documented.

  

It is, Sun's ZFS ACL aware tools use acltotext() to format ACLs.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Miles Nordin
 jr == Julian Regel jrmailgate-zfsdisc...@yahoo.co.uk writes:

jr While I am sure that star is technically a fine utility, the
jr problem is that it is effectively an unsupported product.

I have no problems with this whatsoever.

jr If our customers find a bug in their backup that is caused by
jr a failure in a Sun supplied utility, then they have a legal
jr course of action. The customer's system administrators are
jr covered because they were using tools provided by the
jr vendor. The wrath of the customer would be upon Sun, not the
jr supplier (us) or the supplier's technical lead (me).

We were just talking about this somewhere else, actually: ``if
something goes wrong, its their ass. but if nothing ever gets done,
its nobody's fault.''  It's sad for me how much money is to be made
supporting broken corporate cultures like that.

I'm not saying you're wrong, just that you might not want to
contribute to such a culture because you've chosen to endure it for a
scratch.  You need to have a better way to evaluate employees than
micromanagement-by-the-clueless and vindictive hindsight.  But the
point that there's money to be made by bleeding it out of ossified
broken American companies is well-taken.

jr From the perspective of the business, the system administrator
jr will have acted irresponsibly by choosing a tool that has no
jr vendor support.

From the perspective of MY business, I would much rather have the dark
OOB acl/fork/whatever-magic that's gone into ZFS and NFSv4 supported
in standard tools like rsync and GNUtar.  This is, for example, what
Apple achieved with CUPS and why I can share printers between Ubuntu
and Mac OS effortlessly, and this increases the amount of money I'm
willing to give Apple for their proprietary platform.  The purpose of
the tool I'm discussing definitely includes the same level of
cooperation, so working with the existing best-in-class and
most-popular tools, and reasonableness, might be better than brittle
CYA support in some fringey '/opt/SUNWbkpkit/bin/VendorCP -Rf' tool.

Even if you get your cyaCP tool you may find it doesn't achieve the
ass-covering you wanted because these tools can be cheeky little
bastards.  Most of the other quirky little balkanized-platform
Solaris-only tools are littered with straightjacketing assertions to
avoid ``call generators'' and push the blame back onto the sysadmin,
then there is some ``all bets are off'' flag to allowe you to actually
accomplish job, like 'NOINUSECHECK=1 format -e'.  

Honestly...why bother playing this game?


pgpBTS02xCBZW.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Filesystem Quotas

2010-01-20 Thread Mr. T Doodle
I currently have one filesystem / (root), is it possible to put a quota on
let's say /var? Or would I have to move /var to it's own filesystem in the
same pool?

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS default compression and file size limit?

2010-01-20 Thread Daniel Carosone
On Wed, Jan 20, 2010 at 12:42:35PM -0500, Wajih Ahmed wrote:
 Mike,

 Thank you for your quick response...

 Is there a way for me to test the compression from the command line to  
 see if lzjb is giving me more or less than the 12.5% mark?  I guess it  
 will depend if there is a lzjb command line utility.

 I am just a little surprised because gzip-6 is able to compress it to  
 4.4GB from 14GB (and gzip-1 4.8GB) and from what i read lzjb should be  
 giving me better an 12.5% compression.  For example the *compress*  
 command (which i think uses LZO, a slight different variant of  
 Lempel-Ziv) manges to reduce it to 8.0GB.  That is a 57% ratio. 

That's over the whole file as a single compression stream.  ZFS has to
compress each block (128k or maybe less) independently.  This can't do
as well.

--
Dan.

pgpdvR9z5t17a.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread David Magda


On Jan 20, 2010, at 12:21, Robert Milkowski wrote:


On 20/01/2010 16:22, Julian Regel wrote:



[...]
So you could provision a tape backup for just under £3 (~ 
$49000). In comparison, the cost of one X4540 with ~ 36TB usable  
storage is UK list price £30900. I've not factored in backup  
software since you could use an open source solution such as Amanda  
or Bacula.

[...]
You would also need to add at least one server to your library with  
fc cards.
Then with most software you would need more tapes due to data  
fragmentation and a need to do regular full backups (with zfs+rsync  
you only do a full backup once).


So in best case a library will cost about the same as disk based  
solution but generally will be less flexible, etc. If you would add  
any enterprise software on top of it (Legato, NetBackup, ...) then  
the price would change dramaticallly. Additionally with ZFS one  
could start using deduplication (in testing already).


Regardless of the economics of tape, nowadays you generally need to go  
to disk first because trying to stream at 120 MB/s (LTO-4) really  
isn't practical over the network, directly from the client.


So in the end you'll be starting with disk (either DAS or VTL or  
whatever), and generally going to tape if you need to keep stuff  
that's older than (say) 3-6 months. Tape also doesn't rotate while  
it's sitting there, so if it's going to be sitting around for a while  
(e.g., seven years) better to use tape than something that sucks up  
power.


LTO-5 is expected to be released RSN, with a native capacity of 1.6 TB  
and (uncompressed) writes at 180 MB/s. The only way to realistically  
feed that is from disk.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Quotas

2010-01-20 Thread Tomas Ögren
On 20 January, 2010 - Mr. T Doodle sent me these 1,0K bytes:

 I currently have one filesystem / (root), is it possible to put a quota on
 let's say /var? Or would I have to move /var to it's own filesystem in the
 same pool?

Only filesystems can have different settings.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Cindy Swearingen

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in 
mirrored pool and then immediately running a scrub on the pool. It works 
as expected.


Any other symptoms (like a power failure?) before the disk went offline? 
It is possible that both disks went offline?


We would like to review the crash dump if you still have it, just let me 
know when its uploaded.


Thanks,

Cindy


On 01/19/10 12:30, Frank Middleton wrote:

This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?

The panic starts with:

Jan 19 13:27:13 host6 ^Mpanic[cpu1]/thread=2a1009f5c80:
Jan 19 13:27:13 host6 unix: [ID 403854 kern.notice] assertion failed: 0 
== zap_update(dp-dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, 
DMU_POOL_SCRUB_BOOKMARK, sizeof (uint64_t), 4, dp-dp_scrub_bookmark, 
tx), file: ../../common/fs/zfs/dsl_scrub.c, line: 853


FWIW when the system came back up, it resilvered with no
problem and now I'm rerunning the scrub.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unavailable device

2010-01-20 Thread John
Unfortunately, since we got a new priority on the project, I had to scrap and 
recreate the pool, so I don't have any of the information anymore.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can i make a COMSTAR zvol bigger?

2010-01-20 Thread Errol Neal
On Wed, Jan 20, 2010 02:38  PM, Thomas Burgess wonsl...@gmail.com wrote:
 I finally got iscsi working, and it's amazing...it took a minute for me to
 figure out...i didn't realize it required 2 toolsbut anwyays.
 
 my original zvol is too smalli created a 120 gb zvol for time machine
 but i really need more like 250 gb  so this is a 2 part questions.
 
 First, can i make the zvol/iscsi drive bigger...and also, let's assuming i
 can't (and just for my general knowledge) how can i delete the comstar iscsi
 volume?  I noticed zfs destroy won't work if it's shared iscsi even if i try
 to force it (i was hoping it would just destroy it and i could make a new
 one)

Yes you can. Size of the vol is a ZFS property. 

-bash-3.2# zfs get  all datapool/stores/axigen/lun2
NAME PROPERTY  VALUE  SOURCE
datapool/stores/axigen/lun2  type  volume -
datapool/stores/axigen/lun2  creation  Sun Sep 27 21:40 2009  -
datapool/stores/axigen/lun2  used  250G   -
datapool/stores/axigen/lun2  available 516G   -
datapool/stores/axigen/lun2  referenced87.1G  -
datapool/stores/axigen/lun2  compressratio 1.00x  -
datapool/stores/axigen/lun2  reservation   none   
default
datapool/stores/axigen/lun2  volsize   250G   -
datapool/stores/axigen/lun2  volblocksize  4K -
datapool/stores/axigen/lun2  checksum  on 
default
datapool/stores/axigen/lun2  compression   off
default
datapool/stores/axigen/lun2  readonly  off
default
datapool/stores/axigen/lun2  shareiscsioff
default
datapool/stores/axigen/lun2  copies1  
default
datapool/stores/axigen/lun2  refreservation250G   local
datapool/stores/axigen/lun2  primarycache  all
default
datapool/stores/axigen/lun2  secondarycacheall
default
datapool/stores/axigen/lun2  usedbysnapshots   0  -
datapool/stores/axigen/lun2  usedbydataset 87.1G  -
datapool/stores/axigen/lun2  usedbychildren0  -
datapool/stores/axigen/lun2  usedbyrefreservation  163G   -

Set the volsize property to what you want then, then modify the
logical unit e.g.

Usage:  stmfadm modify-lu [OPTIONS] LU-name
OPTIONS:
-p, --lu-prop  logical-unit-property=value
-s, --size  size K/M/G/T/P
-f, --file

Description: Modify properties of a logical unit.
Valid properties for -p, --lu-prop are:
 alias- alias for logical unit (up to 255 chars)
 mgmt-url - Management URL address
 wcd  - write cache disabled (true, false)
 wp   - write protect (true, false)

You will probably want to offline your target before making these changes. 

Now of course, this doesn't mean the space is immediately usable on the target 
host. If it's Windows you can use diskpart extend. If it's Linux, then you may 
need another method depending upon the file system. 

-Errol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Ian Collins i...@ianshome.com wrote:

  We are talking about TAR and I did give a pointer to the star archive 
  format 
  documentation, so it is obvious that I was talking about the ACL format from
  Sun tar. This format is not documented.
 

 It is, Sun's ZFS ACL aware tools use acltotext() to format ACLs.

Please don't reply without checking facts.

The fact that you know that there is salt in the soup does not give
you the whole list of ingredients Please look into the Sun tar format
to understand that you are wrong.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Richard Elling
Hi Lutz,

On Jan 20, 2010, at 3:17 AM, Lutz Schumann wrote:

 Hello, 
 
 we tested clustering with ZFS and the setup looks like this: 
 
 - 2 head nodes (nodea, nodeb)
 - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)

This makes me nervous. I suspect this is not in the typical QA 
test plan.

 - two external jbods
 - two mirror zpools (pool1,pool2)
   - each mirror is a mirror of one disk from each jbod
 - no ZIL (anyone knows a well priced SAS SSD ?)
 
 We want active/active and added the l2arc to the pools. 
 
 - pool1 has nodea_l2arc as cache
 - pool2 has nodeb_l2arc as cache
 
 Everything is great so far. 
 
 One thing to node is that the nodea_l2arc and nodea_l2arc are named equally ! 
 (c0t2d0 on both nodes).
 
 What we found is that during tests, the pool just picked up the device 
 nodeb_l2arc automatically, altought is was never explicitly added to the pool 
 pool1.

This is strange. Each vdev is supposed to be uniquely identified by its GUID.
This is how ZFS can identify the proper configuration when two pools have 
the same name. Can you check the GUIDs (using zdb) to see if there is a
collision?
 -- richard

 We had a setup stage when pool1 was configured on nodea with nodea_l2arc and 
 pool2 was configured on nodeb without a l2arc. Then we did a failover. Then 
 pool1 pickup up the (until then) unconfigured nodeb_l2arc. 
 
 Is this intended ? Why is a L2ARC device automatically picked up if the 
 device name is the same ? 
 
 In a later stage we had both pools configured with the corresponding l2arc 
 device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). Then 
 we also did a failover. The l2arc device of the pool failing over was marked 
 as too many corruptions instead of missing. 
 
 So from this tests it looks like ZFS just picks up the device with the same 
 name and replaces the l2arc without looking at the device signatures to only 
 consider devices beeing part of a pool.
 
 We have not tested with a data disk as c0t2d0 but if the same behaviour 
 occurs - god save us all.
 
 Can someone clarify the logic behind this ? 
 
 Can also someone give a hint how to rename SAS disk devices in opensolaris ? 
 (to workaround I would like to rename c0t2d0 on nodea (nodea_l2arc) to 
 c0t24d0 and c0t2d0 on nodeb (nodea_l2arc) to c0t48d0). 
 
 P.s. Release is build 104 (NexentaCore 2). 
 
 Thanks!
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Miles Nordin car...@ivy.net wrote:

 From the perspective of MY business, I would much rather have the dark
 OOB acl/fork/whatever-magic that's gone into ZFS and NFSv4 supported
 in standard tools like rsync and GNUtar.  This is, for example, what

GNU tar does not support any platform speficic feature on any OS.
Don't expect that GNU tar will ever add such properties..

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Tomas Ögren
On 20 January, 2010 - Richard Elling sent me these 2,7K bytes:

 Hi Lutz,
 
 On Jan 20, 2010, at 3:17 AM, Lutz Schumann wrote:
 
  Hello, 
  
  we tested clustering with ZFS and the setup looks like this: 
  
  - 2 head nodes (nodea, nodeb)
  - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)
 
 This makes me nervous. I suspect this is not in the typical QA 
 test plan.
 
  - two external jbods
  - two mirror zpools (pool1,pool2)
- each mirror is a mirror of one disk from each jbod
  - no ZIL (anyone knows a well priced SAS SSD ?)
  
  We want active/active and added the l2arc to the pools. 
  
  - pool1 has nodea_l2arc as cache
  - pool2 has nodeb_l2arc as cache
  
  Everything is great so far. 
  
  One thing to node is that the nodea_l2arc and nodea_l2arc are named equally 
  ! (c0t2d0 on both nodes).
  
  What we found is that during tests, the pool just picked up the device 
  nodeb_l2arc automatically, altought is was never explicitly added to the 
  pool pool1.
 
 This is strange. Each vdev is supposed to be uniquely identified by its GUID.
 This is how ZFS can identify the proper configuration when two pools have 
 the same name. Can you check the GUIDs (using zdb) to see if there is a
 collision?

Reproducable:

itchy:/tmp/blah# mkfile 64m 64m disk1
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# zpool create blah /tmp/blah/disk1 
itchy:/tmp/blah# zpool add blah cache /dev/zvol/dsk/rpool/blahcache 
itchy:/tmp/blah# zpool status blah
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zpool export blah
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache 

LABEL 0

version=15
state=4
guid=6931317478877305718

itchy:/tmp/blah# zfs destroy rpool/blahcache
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# dd if=/dev/zero of=/dev/zvol/dsk/rpool/blahcache bs=1024k 
count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.559299 seconds, 120 MB/s
itchy:/tmp/blah# zpool import -d /tmp/blah
  pool: blah
id: 16691059548146709374
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

blah ONLINE
  /tmp/blah/disk1ONLINE
cache
  /dev/zvol/dsk/rpool/blahcache
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0


LABEL 1


LABEL 2


LABEL 3

itchy:/tmp/blah# zpool import -d /tmp/blah blah
itchy:/tmp/blah# zpool status
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0

version=15
state=4
guid=6931317478877305718
...


It did indeed overwrite my formerly clean blahcache.

Smells like a serious bug.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se

  -- richard
 
  We had a setup stage when pool1 was configured on nodea with nodea_l2arc 
  and pool2 was configured on nodeb without a l2arc. Then we did a failover. 
  Then pool1 pickup up the (until then) unconfigured nodeb_l2arc. 
  
  Is this intended ? Why is a L2ARC device automatically picked up if the 
  device name is the same ? 
  
  In a later stage we had both pools configured with the corresponding l2arc 
  device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). 
  Then we also did a failover. The l2arc device of the pool failing over was 
  marked as too many corruptions instead of missing. 
  
  So from this tests it looks like ZFS just picks up the device with the same 
  name and replaces the l2arc without looking at the device signatures to 
  only consider devices beeing part of a pool.
  
  We have not tested with a data disk as c0t2d0 but if the same behaviour 
  

Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


The disk has to fail whilst the scrub is running. It has happened twice now,
once with the bottom half of the mirror, and again with the top half.
 

Any other symptoms (like a power failure?) before the disk went offline?
It is possible that both disks went offline?


Neither. The system is on a pretty beefy UPS, and one half of the mirror
was definitely online (zpool status just before panic showed one disk
offline and the pool as degraded).


We would like to review the crash dump if you still have it, just let me
know when its uploaded.


Do you need the unix.0, vmcore.0 or both? I'll add either or both as
attachments to newly created Bug 14012, Panic running a scrub,
when you let me know which one(s) you want.

Thanks -- Frank


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Cindy Swearingen

Hi Frank,

We need both files.

Thanks,

Cindy

On 01/20/10 15:43, Frank Middleton wrote:

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


The disk has to fail whilst the scrub is running. It has happened twice 
now,

once with the bottom half of the mirror, and again with the top half.
 

Any other symptoms (like a power failure?) before the disk went offline?
It is possible that both disks went offline?


Neither. The system is on a pretty beefy UPS, and one half of the mirror
was definitely online (zpool status just before panic showed one disk
offline and the pool as degraded).


We would like to review the crash dump if you still have it, just let me
know when its uploaded.


Do you need the unix.0, vmcore.0 or both? I'll add either or both as
attachments to newly created Bug 14012, Panic running a scrub,
when you let me know which one(s) you want.

Thanks -- Frank



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Richard Elling
Though the ARC case, PSARC/2007/618 is unpublished, I gather from
googling and the source that L2ARC devices are considered auxiliary,
in the same category as spares. If so, then it is perfectly reasonable to
expect that it gets picked up regardless of the GUID. This also implies
that it is shareable between pools until assigned. Brief testing confirms
this behaviour.  I learn something new every day :-)

So, I suspect Lutz sees a race when both pools are imported onto one
node.  This still makes me nervous though...
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Al Hopper
On Wed, Jan 20, 2010 at 2:52 PM, David Magda dma...@ee.ryerson.ca wrote:

 On Jan 20, 2010, at 12:21, Robert Milkowski wrote:

 On 20/01/2010 16:22, Julian Regel wrote:

 [...]

 So you could provision a tape backup for just under £3 (~$49000). In
 comparison, the cost of one X4540 with ~ 36TB usable storage is UK list
 price £30900. I've not factored in backup software since you could use an
 open source solution such as Amanda or Bacula.

 [...]
 You would also need to add at least one server to your library with fc
 cards.
 Then with most software you would need more tapes due to data
 fragmentation and a need to do regular full backups (with zfs+rsync you only
 do a full backup once).

 So in best case a library will cost about the same as disk based solution
 but generally will be less flexible, etc. If you would add any enterprise
 software on top of it (Legato, NetBackup, ...) then the price would change
 dramaticallly. Additionally with ZFS one could start using deduplication (in
 testing already).

 Regardless of the economics of tape, nowadays you generally need to go to
 disk first because trying to stream at 120 MB/s (LTO-4) really isn't
 practical over the network, directly from the client.

I remember for about 5 years ago (before LT0-4 days) that streaming
tape drives would go to great lengths to ensure that the drive kept
streaming - because it took so much time to stop, backup and stream
again.  And one way the drive firmware accomplished that was to write
blocks of zeros when there was no data available.  This also occurred
when the backup source was sending a bunch of small files, which took
longer to stream and did'nt produce enough data to keep the drive
writing useful data.  And if you had the tape hardware setup to do
compression, then, assuming a normal 2:1 compression ratio, you'd need
to source 240Mb/Sec in order to keep the tape writing 120Mb/Sec.  The
net result was the consumption of a lot more tape than a
back-of-the-napkin calculation told you was required.

Obviously at higher compression ratios or with the higher stream data
write rates you quote below - this problem becomes more troublesome.
So I agree with your conclusion: The only way to realistically feed
that is from disk.

 So in the end you'll be starting with disk (either DAS or VTL or whatever),
 and generally going to tape if you need to keep stuff that's older than
 (say) 3-6 months. Tape also doesn't rotate while it's sitting there, so if
 it's going to be sitting around for a while (e.g., seven years) better to
 use tape than something that sucks up power.

 LTO-5 is expected to be released RSN, with a native capacity of 1.6 TB and
 (uncompressed) writes at 180 MB/s. The only way to realistically feed that
 is from disk.

 ___

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ragnar Sundblad

On 20 jan 2010, at 17.22, Julian Regel wrote:

 It is actually not that easy.
 
 Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO.
 
 Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare 
 + 2x OS disks.
 The four raidz2 group form a single pool. This would provide well over 
 30TB of logical storage per each box.
 
 Now you rsync all the data from your clients to a dedicated filesystem 
 per client, then create a snapshot.
 All snapshots are replicated to a 2nd x4540 so even if you would loose 
 entire box/data for some reason you would still have a spare copy.
 
 Now compare it to a cost of a library, lto drives, tapes, software + 
 licenses, support costs, ...
 
 See more details at 
 http://milek.blogspot.com/2009/12/my-presentation-at-losug.html
 
 I've just read your presentation Robert. Interesting stuff.
 
 I've also just done a pen and paper exercise to see how much 30TB of tape 
 would cost as a comparison to your disk based solution.
 
 Using list prices from Sun's website (and who pays list..?), an SL48 with 2 x 
 LTO3 drives would cost £14000. I couldn't see a price on an LTO4 equipped 
 SL48 despite the Sun website saying it's a supported option. Each LTO3 has a 
 native capacity of 300GB and the SL48 can hold up to 48 tapes in the library 
 (14.4TB native per library). To match the 30TB in your solution, we'd need 
 two libraries totalling £28000.

LTO3 has native capacity 400 GB, LTO4 has 800. The price is about the same per 
tape and per drive, a little higher for LTO4.

 You would also need 100 LTO3 tapes to provide 30TB of native storage. I 
 recently bought a pack of 20 tapes for £340, so five packs would be £1700.

Or rather 37 LTO4 tapes, and only one 48 tape library. But that doesn't matter, 
the interesting part is that one now can use whatever best solves the problem 
at hand.

 So you could provision a tape backup for just under £3 (~$49000). In 
 comparison, the cost of one X4540 with ~ 36TB usable storage is UK list price 
 £30900. I've not factored in backup software since you could use an open 
 source solution such as Amanda or Bacula.
 
 Which isn't to say tape would be a better solution since it's going to be 
 slower to restore etc. But it does show that tape can work out cheaper, 
 especially since the cost of a high speed WAN link isn't required.

Reading from tape is normally faster than reading from (a single) disk. Seek 
time of course isn't.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Ragnar Sundblad

On 21 jan 2010, at 00.20, Al Hopper wrote:

 I remember for about 5 years ago (before LT0-4 days) that streaming
 tape drives would go to great lengths to ensure that the drive kept
 streaming - because it took so much time to stop, backup and stream
 again.  And one way the drive firmware accomplished that was to write
 blocks of zeros when there was no data available.  This also occurred
 when the backup source was sending a bunch of small files, which took
 longer to stream and did'nt produce enough data to keep the drive
 writing useful data.  And if you had the tape hardware setup to do
 compression, then, assuming a normal 2:1 compression ratio, you'd need
 to source 240Mb/Sec in order to keep the tape writing 120Mb/Sec.  The
 net result was the consumption of a lot more tape than a
 back-of-the-napkin calculation told you was required.
 
 Obviously at higher compression ratios or with the higher stream data
 write rates you quote below - this problem becomes more troublesome.
 So I agree with your conclusion: The only way to realistically feed
 that is from disk.

Yes! Modern LTO drives can typically vary their speed about a factor four
or so, so even if you can't keep up with the tape drive maximum speed,
it will typically work pretty good anyway. If you can't keep up even then,
it will have to stop, back up a bit, and restart, which will be _very_
slow. Having a disk system deliver data at 240 MB/s at the same time
as you are writing to it can be a bit of a challenge. 

I haven't seen drives that fill out with zeros. Sounds like an ugly
solution, but maybe it could be useful in some strange case.

/ragge s

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 05:55 PM, Cindy Swearingen wrote:

Hi Frank,

We need both files.


The vmcore is 1.4GB. An http upload is never going to complete.
Is there an ftp-able place to send it, or can you download it if I
post it somewhere?

Cheers -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Daniel Carosone
On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote:
 Though the ARC case, PSARC/2007/618 is unpublished, I gather from
 googling and the source that L2ARC devices are considered auxiliary,
 in the same category as spares. If so, then it is perfectly reasonable to
 expect that it gets picked up regardless of the GUID. This also implies
 that it is shareable between pools until assigned. Brief testing confirms
 this behaviour.  I learn something new every day :-)
 
 So, I suspect Lutz sees a race when both pools are imported onto one
 node.  This still makes me nervous though...

Yes. What if device reconfiguration renumbers my controllers, will
l2arc suddenly start trashing a data disk?  The same problem used to
be a risk for swap,  but less so now that we swap to named zvol. 

There's work afoot to make l2arc persistent across reboot, which
implies some organised storage structure on the device.  Fixing this
shouldn't wait for that.

--
Dan.

pgp1Mb4Zg7Mxp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-20 Thread Joerg Schilling
Ragnar Sundblad ra...@csc.kth.se wrote:

 Yes! Modern LTO drives can typically vary their speed about a factor four
 or so, so even if you can't keep up with the tape drive maximum speed,
 it will typically work pretty good anyway. If you can't keep up even then,
 it will have to stop, back up a bit, and restart, which will be _very_
 slow. Having a disk system deliver data at 240 MB/s at the same time
 as you are writing to it can be a bit of a challenge. 

And star implements a FIFO that is written in a way that dramatically reduces 
the sawtooth behavior seen typically with other applications. You just need to 
tell star to use a say 2 GB FIFO and star will be able to keep the tame 
streaming for a longer time before it waits until there is enough data for the 
next longer streaming period.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] can i make a COMSTAR zvol bigger?

2010-01-20 Thread Thomas Burgess



 Yes you can. Size of the vol is a ZFS property.


 yes, i knew this but i wasn't sure how to do the REST. =)


 Set the volsize property to what you want then, then modify the
 logical unit e.g.

 Usage:  stmfadm modify-lu [OPTIONS] LU-name
OPTIONS:
-p, --lu-prop  logical-unit-property=value
-s, --size  size K/M/G/T/P
-f, --file

 Description: Modify properties of a logical unit.
 Valid properties for -p, --lu-prop are:
 alias- alias for logical unit (up to 255 chars)
 mgmt-url - Management URL address
 wcd  - write cache disabled (true, false)
 wp   - write protect (true, false)

 You will probably want to offline your target before making these changes.

 Now of course, this doesn't mean the space is immediately usable on the
 target host. If it's Windows you can use diskpart extend. If it's Linux,
 then you may need another method depending upon the file system.

 -Errol

I don't even care if i need to reformat it on my target host, so long as i
can make it bigger. Thanks for the help.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL

2010-01-20 Thread Daniel Carosone
On Wed, Jan 20, 2010 at 10:04:34AM -0800, Willy wrote:
 To those concerned about this issue, there is a patched version of
 smartmontools that enables the querying and setting of TLER/ERC/CCTL
 values (well, except for recent desktop drives from Western
 Digitial).

[Joining together two recent threads...]

Can you (or anyone else on the list) confirm if this works with the
samsung drives discussed here recently? (HD145UI and the 2Tb version)

I've been a regular purchaser of WD drives for some time, and they
have been good to me.  

However, I found this recent change disturbing and annoying; now that
I realise it is actually against the standards I'm even more annoyed.
It's coming time to purchase another batch of disks, so I have
begun paying closer attention again recently. 

WD may try to force customers to buy the more expensive drives, but
find instead that their customers choose another drive manufacturer
altogether.  Users of RAID (to whom this change matters) are, by
definition, purchasers of larger numbers of drives.

I was also interested in the 4k-sector WD advanced format WD-EARS
drives, but if they have the same limitation, and the Samsung drives
allow ERC, my choice is made.

 It's available here
 http://www.csc.liv.ac.uk/~greg/projects/erc/ 
 
 Unfortunately, smartmontools has limited SATA drive support in
 opensolaris, and you cannot query or set the values.  I'm looking into
 booting into linux, setting the values, and then rebooting into
 opensolaris since the settings will survive a warm reboot (but not a
 powercycle). 

This clearly needs to be fixed and is a project worth someone taking
on! Any volunteers?  

--
Dan.

pgpSqXnDTQWph.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
Can anyone recommend a optimum and redundant striped configuration for a X4500? 
 We'll be using it for a OLTP (Oracle) database and will need best performance. 
 Is it also true that the reads will be load-balanced across the mirrors?

Is this considered a raid 1+0 configuration?  
zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 
 mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror 
c0t2d0 c1t2d0 
 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror 
c4t3d0 c5t3d0 
 mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror 
c0t5d0 c1t5d0 
 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror 
c4t6d0 c5t6d0 
 mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror 
c6t7d0 c7t7d0 
 mirror c7t0d0 c7t4d0

Is it even possible to do a raid 0+1?
zpool create -f testpool c0t0d0 c4t0d0 c0t1d0 c4t1d0 c6t1d0 c0t2d0 c4t2d0 
c6t2d0 c0t3d0 c4t3d0 c6t3d0 c0t4d0 c4t4d0 c0t5d0 c4t5d0 c6t5d0 c0t6d0 c4t6d0  
c6t6d0 c0t7d0 c4t7d0 c6t7d0 c7t0d0 mirror c1t0d0 c6t0d0 c1t1d0 c5t1d0 c7t1d0 
c1t2d0 c5t2d0 c7t2d0 c1t3d0 c5t3d0 c7t3d0 c1t4d0 c6t4d0 c1t5d0 c5t5d0 c7t5d0 
c1t6d0 c5t6d0 c7t6d0 c1t7d0 c5t7d0 c7t7d0 c7t4d0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Bob Friesenhahn

On Wed, 20 Jan 2010, Brad wrote:

Can anyone recommend a optimum and redundant striped configuration 
for a X4500?  We'll be using it for a OLTP (Oracle) database and 
will need best performance.  Is it also true that the reads will be 
load-balanced across the mirrors?


Is this considered a raid 1+0 configuration?


Zfs does not strictly support RAID 1+0.  However, your sample command 
will create a pool based on mirror vdevs which is written to in a 
load-shared fashion (not striped).  This type of pool is ideal for 
databases since it consumes the least of those precious IOPS.  With 
SATA drives, you need to preserve those precious IOPS as much as 
possible.


Zfs does not do striping across vdevs, but its load share approach 
will write based on (roughly) a round-robin basis, but will also 
prefer a less loaded vdev when under a heavy write load, or will 
prefer to write to an empty vdev rather than write to an almost full 
one.  Due to zfs behavior, it is best to provision the full number of 
disks to start with so that the disks are evenly filled and the data 
is well distributed.


Reads from mirror pairs use a simple load share algorithm to select 
the mirror side which does not attempt to strictly balance the reads. 
This does provide more performance than one disk, but not twice the 
performance.



Is it even possible to do a raid 0+1?


No.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread John
Have you looked at using Oracle ASM instead of or with ZFS? Recent Sun docs 
concerning the F5100 seem to recommend a hybrid of both.

If you don't go that route, generally you should separate redo logs from actual 
data so they don't compete for I/O, since a redo switch lagging hangs the 
database. If you use archive logs, separate that on to yet another pool.

Realistically, it takes lots of analysis with different configurations. Every 
workload and database is different.

A decent overview of configuring JBOD-type storage for databases is here, 
though it doesn't use ASM...
https://www.sun.com/offers/docs/j4000_oracle_db.pdf
It's a couple years old and that might contribute to the lack of an ASM mention.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
@hortnon - ASM is not within the scope of this project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
Zfs does not do striping across vdevs, but its load share approach
will write based on (roughly) a round-robin basis, but will also
prefer a less loaded vdev when under a heavy write load, or will
prefer to write to an empty vdev rather than write to an almost full
one.

I'm trying to visualize this...can you elaborate or give a ascii example?

So with the syntax below, load sharing is implemented?

zpool create testpool disk1 disk2 disk3
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2010-01-20 Thread Moshe Vainer
Hi George. 
Any news on this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Brad
I was reading your old posts about load-shares 
http://opensolaris.org/jive/thread.jspa?messageID=294580#294580 .

So between raidz and load-share striping, raidz stripes a file system block 
evenly across each vdev but with load sharing the file system block is written 
on a vdev that's not filled up (slab??) then for the next file system block it 
continues filling up the 1MB slab until its full being moving on to the next 
one?

Richard can you comment? :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


As noted, the disk mustn't go offline until well after the scrub has started.

There's another wrinkle. There are some COMSTAR iscsi targets on this
pool. If there are no initiators accessing any of them, the scrub completes
with no errors after 6 hours. If one specific target is active, the panic
ensues reproducibly at about 5h30m or so.

The precise configuration has 2 disks on one LSI controller as a
mirrored pool (whole disks - no slices). Around 750GB of 1.3TB was
in use when the most recent iscsi target was created. The pool
is read-mostly, so it probably isn't fragmented. The zvol has
copies=1; compression off (no dedupe with snv124). The initiator
is VirtualBox running on Fedora C10 on AMD64 and the target disk
has 32 bit Fedora C12 installed as whole disk, which I believe is EFI.

To reproduce this might require setting up a COMSTAR iscsi
target on a mirrored pool, formatting it with an EFI label, and
then running a scrub. Another, similar, target has OpenSolaris
installed on it, and it doesn't seem to cause a panic on a scrub
if it is running; AFAIK it doesn't use EFI, but I have not run
a scrub with it active since converting to COMSTAR either.

This wouldn't explain why one or the other disk randomly goes
offline and it may be a red herring. But the scrub now runs to
completion just as it always has. Since I can't get FC12 to boot
from the EFI disk in VirtualBox, I may reinstall FC12 without
EFI and see if that makes a difference, but it is an extremely
slow process since it takes almost 6 hours for the panic to occur
each time and there's no practical way to relocate the zvol
to the start of the pool.

HTH -- Frank




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-20 Thread Richard Elling
On Jan 20, 2010, at 8:14 PM, Brad wrote:

 I was reading your old posts about load-shares 
 http://opensolaris.org/jive/thread.jspa?messageID=294580#294580 .
 
 So between raidz and load-share striping, raidz stripes a file system block 
 evenly across each vdev but with load sharing the file system block is 
 written on a vdev that's not filled up (slab??) then for the next file system 
 block it continues filling up the 1MB slab until its full being moving on to 
 the next one?
 
 Richard can you comment? :)

That seems to be a reasonable interpretation.  The nit is that the 1MB 
changeover
is not the slab size.  Slab sizes are usually much larger.

In my list of things to remember for Oracle and ZFS:
1. recordsize is the biggest tuning knob
2. put redo log on a low latency device, SSD if possible
3. avoid raidz, when possible
4. prefer to give memory to the SGA rather than the ARC

Roch provides some good guidelines when you have an SSD and a
ZFS release which offers the logbias property here:
http://blogs.sun.com/roch/entry/synchronous_write_bias_property

 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hang zpool detach while scrub is running..

2010-01-20 Thread Steve Radich, BitShop, Inc.
I know, we should have done zpool scrub -s first.. but.. sigh..

bits...@zfs:/opt/StorMan# zpool status -v tankmir1
  pool: tankmir1
 state: ONLINE
 scrub: scrub in progress for 0h16m, 0.14% done, 187h17m to go
config:

NAMESTATE READ WRITE CKSUM
tankmir1ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0

errors: No known data errors
bits...@zfs:/opt/StorMan# zpool detach tankmir1 c7t7d0

(hung).

At this point a new SSH session to the server hangs during login, I was 
connected via KVM over IP to the console and don't seem to have any problem 
with that session (although I'm not trying to log off and back in). 

iostat shows all activity on c7t6d0 (as expected), however IO is extremely slow 
( 1 megabyte / second and 100% busy). We've been fighting a slow i/o problem 
with Seagate ES2 drives, some needed firmware flashing which we didn't catch 
before they were in a pool, so remove, flash, reinstall, resilver, etc. is a 
long process. 

Anything I can try to dump that's not overly intrusive? The system seems to be 
working still for iSCSI and CIFS which is it's purpose, so a reboot isn't 
planned unless this hangs in more ways.  Hopefully it will respond in a while.

snv129 installed.


Steve Radich - Founder and Principal of Business Information Technology Shop - 
www.bitshop.com 
Developer Resources Site: www.ASPDeveloper.Net  - www.VirtualServerFAQ.com 
  LinkedIn Public Profile: http://www.linkedin.com/in/steveradich
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HTPC

2010-01-20 Thread Günther
hello

i have basically tested supermicro mainboard x8dth-6f together with nexenta
http://www.supermicro.com/products/motherboard/QPI/5500/X8DTH-6F.cfm

(same sas-II lsi-2008 chipset)

nexenta 2: did not work
nexenta 3: (snv 124+) install without problem, but no further testing

see also my reference hardware for nexenta:
http://www.napp-it.org/hardware/index.html


gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss