Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Ross
Aha, found it!  It was this thread, also started by Carsten :)
http://www.opensolaris.org/jive/thread.jspa?threadID=78921&tstart=45
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Ross
Guys, this looks to me like the second time we've had something like this 
reported on the forums for an x4500, again with the first zvol having much 
lower load than the other two, despite being created at the same time.

I can't find the thread to check, can anybody else remember it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs free space

2008-12-02 Thread Sanjeev
Hi,

A good rough estimate would be the total of the space
that is displayed under the "USED" column of "zfs list" for those snapshots.

Here is an example :
-- snip --
[EMAIL PROTECTED] zfs list -r tank
NAME USED  AVAIL  REFER  MOUNTPOINT
tank24.6M  38.9M19K  /tank
tank/fs124.4M  38.9M18K  /tank/fs1
tank/[EMAIL PROTECTED]  24.4M  -  24.4M  -
-- snip --

In the above case tank/[EMAIL PROTECTED] is using 24.4M. So, if we delete
that snapshot it would freeup about 24.4M. Let's delete it an
see what we get :

-- snip --
[EMAIL PROTECTED] zfs destroy tank/[EMAIL PROTECTED]
[EMAIL PROTECTED] zfs list -r tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   220K  63.3M19K  /tank
tank/fs118K  63.3M18K  /tank/fs1
-- snip --

So, we did get back 24.4M freed (39.9M + 24.4M = 63.3M).

Note that this could get a little complicated if there are multiple
snapshots which refer to the same set of blocks. So, even after deleting
one snapshot you might not see the space freed up. And this could be because,
of the second snapshot which is refering to some of the blocks still.

Hope that helps.

Thanks and regards,
Sanjeev
 
On Wed, Dec 03, 2008 at 12:26:48AM +, Robert Milkowski wrote:
> Hello none,
> 
> Thursday, November 6, 2008, 7:55:42 PM, you wrote:
> 
> n> Hi Milek,
> n> Thanks for your reply.
> n> What I really need is a way to tell how much space will be freed
> n> for any particular set of snapshots that I delete. 
> 
> n> So I would like to query zfs,
> n> "if I delete these snapshots
> n> storage/[EMAIL PROTECTED] 
> n> storage/[EMAIL PROTECTED]
> n> how much space will be freed?" 
> 
> I'm afraid you can do only one at a time.
> 
> -- 
> Best regards,
>  Robertmailto:[EMAIL PROTECTED]
>http://milek.blogspot.com
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Mike Gerdts
On Tue, Dec 2, 2008 at 6:13 PM, Lori Alt <[EMAIL PROTECTED]> wrote:
> On 12/02/08 10:24, Mike Gerdts wrote:
> I follow you up to here.  But why do the next steps?
>
> > zonecfg -z $zone
> > remove fs dir=/var
> >
> > zfs set mountpoint=/zones/$zone/root/var rpool/zones/$zone/var

It's not strictly required to perform this last set of commands, but
the lofs mount point is not really needed.  Longer term it will likely
look cleaner (e.g. to live upgrade) to not have this lofs mount.  That
is, I suspect that live upgrade is more likely to look at /var in the
zone and say "ahhh, that is a zfs file system - I known how to deal
with that" than it is for it to say "ahhh, that is a lofs file system
to some other zfs file system in the global zone - I know how to deal
with that."

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-02 Thread Alan Rubin
It's something we've considered here as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs free space

2008-12-02 Thread Robert Milkowski
Hello none,

Thursday, November 6, 2008, 7:55:42 PM, you wrote:

n> Hi Milek,
n> Thanks for your reply.
n> What I really need is a way to tell how much space will be freed
n> for any particular set of snapshots that I delete. 

n> So I would like to query zfs,
n> "if I delete these snapshots
n> storage/[EMAIL PROTECTED] 
n> storage/[EMAIL PROTECTED]
n> how much space will be freed?" 

I'm afraid you can do only one at a time.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool replace - choke point

2008-12-02 Thread Matt Walburn
Would any of this have to do with the system being a T2000? Would ZFS
resilvering be affected by single threadedness, slowish US-T1 clock
speed or lack of strong FPU performance?

On 12/1/08, Alan Rubin <[EMAIL PROTECTED]> wrote:
> We will be considering it in the new year,  but that will not happen in time
> to affect our current SAN migration.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
--
Matt Walburn
http://mattwalburn.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Lori Alt

On 12/02/08 11:04, Brian Wilson wrote:

- Original Message -
From: Lori Alt <[EMAIL PROTECTED]>
Date: Tuesday, December 2, 2008 11:19 am
Subject: Re: [zfs-discuss] Separate /var
To: Gary Mills <[EMAIL PROTECTED]>
Cc: zfs-discuss@opensolaris.org

  

On 12/02/08 09:00, Gary Mills wrote:


On Mon, Dec 01, 2008 at 04:45:16PM -0700, Lori Alt wrote:
  
  

   On 11/27/08 17:18, Gary Mills wrote:
On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:

I'm currently working with an organisation who
want use ZFS for their  > full zones. Storage is SAN attached, and 


they

also want to create a  > separate /var for each zone, which causes 


issues


when the zone is  > installed. They believe that a separate /var is
still good practice.
If your mount options are different for /var and /, you will need
a separate filesystem.  In our case, we use `setuid=off' and
`devices=off' on /var for security reasons.  We do the same thing
for home directories and /tmp .

For zones?

Sure, if you require different mount options in the zones.

   I looked into this and found that, using ufs,  you can indeed 


set up


   the zone's /var directory as a separate file system.  I  don't know
   about
   how LiveUpgrade works with that configuration (I didn't try it).
   But I was at least able to get the zone to install and boot.
   But with zfs, I couldn't even get a zone with a separate /var
   dataset to install, let alone be manageable with LiveUpgrade.
   I configured the zone like so:
   # zonecfg -z z4
   z4: No such zone configured
   Use 'create' to begin configuring a new zone.
   zonecfg:z4> create
   zonecfg:z4> set zonepath=/zfszones/z4
   zonecfg:z4> add fs
   zonecfg:z4:fs> set dir=/var
   zonecfg:z4:fs> set special=rpool/ROOT/s10x_u6wos_07b/zfszones/z4/var
   zonecfg:z4:fs> set type=zfs
   zonecfg:z4:fs> end
   zonecfg:z4> exit
   I then get this result from trying to install the zone:
   prancer# zoneadm -z z4 install
   Preparing to install zone .
   ERROR: No such file or directory: cannot mount 




I think you're running into the problem of defining the var as the filesystem 
that already exists under the zone root.  We had issues with that, so any time 
I've been doing filesystems, I don't push in zfs datasets, I create a zfs 
filesystem in the global zone and mount that directory into the zone with lofs. 
 For example, I've got a pool zdisk with a filesystem down the path -
zdisk/zones/zvars/(zonename)

which mounts itself to -
/zdisk/zones/zvars/(zonename)

It's a ZFS filesystem with quota and reservation setup, and I just do an lofs 
to it via these lines in the /etc/zones/(zonename).xml file -

  

  

I think that's the equivalent of the following zonecfg lines -

zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/var
zonecfg:z4:fs> set special=/zdisk/zones/zvars/z4/var
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> end

I think to put the zfs into the zone, you need to do an add dataset, instead of 
an add fs.  I tried that once and didn't like the results though completely.  
The dataset was controllable inside the zone (which is what I wanted at the 
time), but it wasn't controllably from the global zone anymore.  And I couldn't 
access it from the global zone easily to get the backup software to pick it up.

Doing it this way means you have to manage the zfs datasets from the global 
zone, but that's not really an issue here.
  

So I tried your suggestion and it appears to work, at
least initially (I have a feeling that it will cause
problems later if I want to clone the BE using LiveUpgrade,
but first things first.)



So, create the separate filesystems you want in the global zone (without 
stacking them under the zoneroot - separate directory somewhere


Why does it have to be in a separate directory?

lori

 setup the zfs stuff you want, then lofs it into the local zone.  I've had that 
install successfully before.

Hope that's helpful in some way!

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Lori Alt

On 12/02/08 10:24, Mike Gerdts wrote:

On Tue, Dec 2, 2008 at 11:17 AM, Lori Alt <[EMAIL PROTECTED]> wrote:
  

I did pre-create the file system.  Also, I tried omitting "special" and
zonecfg complains.

I think that there might need to be some changes
to zonecfg and the zone installation code to get separate
/var datasets in non-global zones to work.



You could probably do something like:

zfs create rpool/zones/$zone
zfs create rpool/zones/$zone/var

zonecfg -z $zone
add fs
  set dir=/var
  set special=/zones/$zone/var
  set type=lofs
  end
...

zoneadm -z $zone install

  


I follow you up to here.  But why do the next steps?


zonecfg -z $zone
remove fs dir=/var

zfs set mountpoint=/zones/$zone/root/var rpool/zones/$zone/var

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem importing degraded Pool

2008-12-02 Thread Brian Couper
Hi,

Eeemmm, i think its safe to say your zpool and its data are gone for ever.
Use the Samsung disk checker boot CD, and see if it can fix your faulty disk. 
Then connect all 3 drives to your system and use raidz. Your data will then be 
well protected.

Brian,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] continuous replication

2008-12-02 Thread Robert Milkowski
Hello Mattias,

Saturday, November 15, 2008, 12:24:05 AM, you wrote:

MP> On Sat, Nov 15, 2008 at 00:46, Richard Elling <[EMAIL PROTECTED]> wrote:
>> Adam Leventhal wrote:
>>>
>>> On Fri, Nov 14, 2008 at 10:48:25PM +0100, Mattias Pantzare wrote:
>>>

 That is _not_ active-active, that is active-passive.

 If you have a active-active system I can access the same data via both
 controllers at the same time. I can't if it works like you just
 described. You can't call it active-active just because different
 volumes are controlled by different controllers. Most active-passive
 RAID controllers can do that.

 The data sheet talks about active-active clusters, how does that work?

>>>
>>> What the Sun Storage 7000 Series does would more accurately be described
>>> as
>>> dual active-passive.
>>>
>>
>> This is ambiguous in the cluster market.  It is common to describe
>> HA clusters where each node can be offering services concurrently,
>> as active/active, even though the services themselves are active/passive.
>> This is to appease folks who feel that idle secondary servers are a bad
>> thing.

MP> But this product is not in the cluster market. It is in the storage market.

MP> By your definition virtually all dual controller RAID boxes are 
active/active.

MP> You should talk to Veritas so that they can change all their 
documentation...

MP> Active/active and active/passive has a real technical meaning, don't
MP> let marketing destroy that!

I thought that when you can access the same LUN via different
controller then you have a symmetric disk array and when you can't you
have an asymmetric one. It has nothing to do with active-active or
active-standby. Most of a disk arrays in the marked are active-active
and asymmetric.



-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Toby Thain

On 2-Dec-08, at 3:35 PM, Miles Nordin wrote:

>> "r" == Ross  <[EMAIL PROTECTED]> writes:
>
>  r> style before I got half way through your post :) [...status
>  r> problems...] could be a case of oversimplifying things.
> ...
> And yes, this is a religious argument.  Just because it spans decades
> of experience and includes ideas of style doesn't mean it should be
> dismissed as hocus-pocus.  And I don't like all these binary config
> files either.  Not even Mac OS X is pulling that baloney any more.

OS X never used binary config files; it standardised on XML property  
lists for the new subsystems (plus a lot of good old fashioned UNIX  
config).

Perhaps you are thinking of Mac OS 9 and earlier (resource forks).

--Toby
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HP Smart Array and b99?

2008-12-02 Thread sim
OK,

In the end I managed to install OpenSolaris snv_101b on hp blade on smart array 
drive directly from install cd. Everything is fine. The problems  I experienced 
with hangs on boot on snv_99+ is related to Qlogic driver, but this is a 
different story.

Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RE : rsync using 100% of a cpu

2008-12-02 Thread zfs user
Francois Dion wrote:
>  >>"Francois Dion" wrote:
>  >> Source is local to rsync, copying from a zfs file system, 
>  >> destination is remote over a dsl connection. Takes forever to just 
>  >> go through the unchanged files. Going the other way is not a 
>  >> problem, it takes a fraction of the time. Anybody seen that? 
>  >> Suggestions?
>  >De: Blake Irvin [mailto:[EMAIL PROTECTED]
>  >Upstream when using DSL is much slower than downstream?
> 
> No, that's not the problem. I know ADSL is assymetrical. When there is 
> an actual data transfer going on, the cpu drops to 0.2%. It's only when 
> rsync is doing its thing (reading, not writing) locally that it pegs the 
> cpu. We are talking 15 minutes in one direction while in the other it 
> looks like I'll pass the 24 hours mark before the rsync is complete. And 
> there were less than 100MB added on each side.
> 
> BTW, the only other process I've seen that pegs the cpu solid for as 
> long as it runs on my v480 is when I downloaded Belenix through a python 
> script (btdownloadheadless).

Is the list of files long? rsync 3.0.X does not use a monolithic file list 
pull and uses less memory...

Are you using a -c option or other option that causes rsync to checksum every 
block of all the files?

Is the zfs file system compressed, so it has to decompress each block so that 
rsync can checksum it?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-12-02 Thread Vincent Fox
> 
> I don't want to steer you wrong under the
> circumstances,
> so I think we need more information. 
> 
> First, is the failure the same as in the earlier part
> of this
> thread.   I.e., when you boot, do you get a failure
> like this?
> 
> Warning: Fcode sequence resulted in a net stack depth
> change of 1
> Evaluating:
> 
> Evaluating:
> 
> The file just loaded does not appear to be executable

Nope:


Sun Fire T200, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.27.4, 16256 MB memory available, Serial #75621394.
Ethernet address 0:14:4f:81:e4:12, Host ID: 8481e412.



Boot device: /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]  File and args:
ufs-file-system
Loading: /platform/SUNW,Sun-Fire-T200/boot_archive
Loading: /platform/sun4v/boot_archive

Can't open boot_archive

Evaluating:
The file just loaded does not appear to be executable.
===

> 
> Second, at least at first glance, this looks like
> more of
> a generic patch problem than a problem specifically
> related to zfs boot.  Since this is S10, not
> OpenSolaris,
> perhaps you should be escalating this through the
> standard support channels.  This alias probably
> won't get you any really useful answers on general
> problems with patching.

Yeah I just thought since I'd followed this thread before it might
be useful to add to it since there might be crossover issues.
I'll keep pushing on the string.  I hate being the annoying customer
who says "I won't follow your suggestion because (blah) please
escalate this ticket."

I hope to move these systems over to 10u6 in a few months
and streamline our patching so problems like this won't exist.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-12-02 Thread Lori Alt

I don't want to steer you wrong under the circumstances,
so I think we need more information. 

First, is the failure the same as in the earlier part of this
thread.   I.e., when you boot, do you get a failure like this?

Warning: Fcode sequence resulted in a net stack depth change of 1
Evaluating:

Evaluating:

The file just loaded does not appear to be executable


Second, at least at first glance, this looks like more of
a generic patch problem than a problem specifically
related to zfs boot.  Since this is S10, not OpenSolaris,
perhaps you should be escalating this through the
standard support channels.  This alias probably
won't get you any really useful answers on general
problems with patching.

Lori

On 12/02/08 14:42, Vincent Fox wrote:
> The SupportTech responding to case #66153822 so far
> has only suggested "boot from cdrom and patchrm 137137-09"
> which tells me I'm dealing with a level-1 binder monkey.
> It's the idle node of a cluster holding 10K email accounts
> so I'm proceeding cautiously.  It is unfortunate the admin doing
> the original patching did them from multi-user but here we are.
>
> I am attempting to boot net:dhcp -s just to collect more info:
>
> My patchadd output shows 138866-01 & 137137-09 being applied OK:
>
> bash-3.00# patchadd /net/matlock/local/d02/patches/all_patches/138866-01
> Validating patches...
>
> Loading patches installed on the system...
>
> Done!
>
> Loading patches requested to install.
>
> Done!
>
> Checking patches that you specified for installation.
>
> Done!
>
>
> Approved patches will be installed in this order:
>
> 138866-01 
>
>
> Checking installed patches...
> Verifying sufficient filesystem capacity (dry run method)...
> Installing patch packages...
>
> Patch 138866-01 has been successfully installed.
> See /var/sadm/patch/138866-01/log for details
>
> Patch packages installed:
>   SUNWcsr
>
> bash-3.00# patchadd /net/matlock/local/d02/patches/all_patches/137137-09
> Validating patches...
>
> Loading patches installed on the system...
>
> Done!
>
> Loading patches requested to install.
>
> Version of package SUNWcakr from directory SUNWcakr.u in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWcar from directory SUNWcar.u in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWkvm from directory SUNWkvm.c in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWkvm from directory SUNWkvm.d in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWkvm from directory SUNWkvm.m in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWkvm from directory SUNWkvm.u in patch 137137-09 
> differs from the package installed on the system.
> Architecture for package SUNWnxge from directory SUNWnxge.u in patch 
> 137137-09 differs from the package installed on the system.
> Version of package SUNWcakr from directory SUNWcakr.us in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWcar from directory SUNWcar.us in patch 137137-09 
> differs from the package installed on the system.
> Version of package SUNWkvm from directory SUNWkvm.us in patch 137137-09 
> differs from the package installed on the system.
> Done!
>
> The following requested patches have packages not installed on the system
> Package SUNWcpr from directory SUNWcpr.u in patch 137137-09 is not installed 
> on the system. Changes for package SUNWcpr will not be applied to the system.
> Package SUNWefc from directory SUNWefc.u in patch 137137-09 is not installed 
> on the system. Changes for package SUNWefc will not be applied to the system.
> Package SUNWfruip from directory SUNWfruip.u in patch 137137-09 is not 
> installed on the system. Changes for package SUNWfruip will not be applied to 
> the system.
> Package SUNWluxd from directory SUNWluxd.u in patch 137137-09 is not 
> installed on the system. Changes for package SUNWluxd will not be applied to 
> the system.
> Package SUNWs8brandr from directory SUNWs8brandr in patch 137137-09 is not 
> installed on the system. Changes for package SUNWs8brandr will not be applied 
> to the system.
> Package SUNWs8brandu from directory SUNWs8brandu in patch 137137-09 is not 
> installed on the system. Changes for package SUNWs8brandu will not be applied 
> to the system.
> Package SUNWs9brandr from directory SUNWs9brandr in patch 137137-09 is not 
> installed on the system. Changes for package SUNWs9brandr will not be applied 
> to the system.
> Package SUNWs9brandu from directory SUNWs9brandu in patch 137137-09 is not 
> installed on the system. Changes for package SUNWs9brandu will not be applied 
> to the system.
> Package SUNWus from directory SUNWus.u in patch 137137-09 is not installed on 
> the system. Changes for package SUNWus will not be applied to the system.
> Package SUNWefc from directory SUNW

Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-12-02 Thread Vincent Fox
The SupportTech responding to case #66153822 so far
has only suggested "boot from cdrom and patchrm 137137-09"
which tells me I'm dealing with a level-1 binder monkey.
It's the idle node of a cluster holding 10K email accounts
so I'm proceeding cautiously.  It is unfortunate the admin doing
the original patching did them from multi-user but here we are.

I am attempting to boot net:dhcp -s just to collect more info:

My patchadd output shows 138866-01 & 137137-09 being applied OK:

bash-3.00# patchadd /net/matlock/local/d02/patches/all_patches/138866-01
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Done!

Checking patches that you specified for installation.

Done!


Approved patches will be installed in this order:

138866-01 


Checking installed patches...
Verifying sufficient filesystem capacity (dry run method)...
Installing patch packages...

Patch 138866-01 has been successfully installed.
See /var/sadm/patch/138866-01/log for details

Patch packages installed:
  SUNWcsr

bash-3.00# patchadd /net/matlock/local/d02/patches/all_patches/137137-09
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Version of package SUNWcakr from directory SUNWcakr.u in patch 137137-09 
differs from the package installed on the system.
Version of package SUNWcar from directory SUNWcar.u in patch 137137-09 differs 
from the package installed on the system.
Version of package SUNWkvm from directory SUNWkvm.c in patch 137137-09 differs 
from the package installed on the system.
Version of package SUNWkvm from directory SUNWkvm.d in patch 137137-09 differs 
from the package installed on the system.
Version of package SUNWkvm from directory SUNWkvm.m in patch 137137-09 differs 
from the package installed on the system.
Version of package SUNWkvm from directory SUNWkvm.u in patch 137137-09 differs 
from the package installed on the system.
Architecture for package SUNWnxge from directory SUNWnxge.u in patch 137137-09 
differs from the package installed on the system.
Version of package SUNWcakr from directory SUNWcakr.us in patch 137137-09 
differs from the package installed on the system.
Version of package SUNWcar from directory SUNWcar.us in patch 137137-09 differs 
from the package installed on the system.
Version of package SUNWkvm from directory SUNWkvm.us in patch 137137-09 differs 
from the package installed on the system.
Done!

The following requested patches have packages not installed on the system
Package SUNWcpr from directory SUNWcpr.u in patch 137137-09 is not installed on 
the system. Changes for package SUNWcpr will not be applied to the system.
Package SUNWefc from directory SUNWefc.u in patch 137137-09 is not installed on 
the system. Changes for package SUNWefc will not be applied to the system.
Package SUNWfruip from directory SUNWfruip.u in patch 137137-09 is not 
installed on the system. Changes for package SUNWfruip will not be applied to 
the system.
Package SUNWluxd from directory SUNWluxd.u in patch 137137-09 is not installed 
on the system. Changes for package SUNWluxd will not be applied to the system.
Package SUNWs8brandr from directory SUNWs8brandr in patch 137137-09 is not 
installed on the system. Changes for package SUNWs8brandr will not be applied 
to the system.
Package SUNWs8brandu from directory SUNWs8brandu in patch 137137-09 is not 
installed on the system. Changes for package SUNWs8brandu will not be applied 
to the system.
Package SUNWs9brandr from directory SUNWs9brandr in patch 137137-09 is not 
installed on the system. Changes for package SUNWs9brandr will not be applied 
to the system.
Package SUNWs9brandu from directory SUNWs9brandu in patch 137137-09 is not 
installed on the system. Changes for package SUNWs9brandu will not be applied 
to the system.
Package SUNWus from directory SUNWus.u in patch 137137-09 is not installed on 
the system. Changes for package SUNWus will not be applied to the system.
Package SUNWefc from directory SUNWefc.us in patch 137137-09 is not installed 
on the system. Changes for package SUNWefc will not be applied to the system.
Package SUNWluxd from directory SUNWluxd.us in patch 137137-09 is not installed 
on the system. Changes for package SUNWluxd will not be applied to the system.
Package FJSVvplr from directory FJSVvplr.u in patch 137137-09 is not installed 
on the system. Changes for package FJSVvplr will not be applied to the system.
Package FJSVvplr from directory FJSVvplr.us in patch 137137-09 is not installed 
on the system. Changes for package FJSVvplr will not be applied to the system.

Checking patches that you specified for installation.

Done!


Approved patches will be installed in this order:

137137-09 


Checking installed patches...
Executing prepatch script...
Verifying sufficient filesystem capacity (dry run method)...
Dec  2 10:05:58 cyrus2-2 cfenvd[706]:  LDT(3) in loadavg chi = 19.18 thresh 
11.58 
D

Re: [zfs-discuss] A failed disk can bring down a machine?

2008-12-02 Thread Brian Hechinger
On Tue, Dec 02, 2008 at 12:50:08PM -0600, Tim wrote:
> On Tue, Dec 2, 2008 at 11:42 AM, Brian Hechinger <[EMAIL PROTECTED]> wrote:
> 
> I believe the issue you're running into is the failmode you currently have
> set.  Take a look at this:
> http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/

Ah ha!  It's now set to continue.  Hopefully that'll save me next time this
happens.  Which I hope isn't too soon. ;)

Sadly this has rid me of my urgent need to replace that box, which I suppose 
isn't a bad
thing as I can now take my time.

Anyone have any opinions of that ASUS box running the latest OpenSolaris?

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-12-02 Thread Enda O'Connor
Vincent Fox wrote:
> Reviving this thread.
> 
> We have a Solaris 10u4 system recently patched with 137137-09.
> Unfortunately the patch was applied from multi-user mode, I wonder if this
> may have been original posters problem as well?  Anyhow we are now stuck
> with an unbootable system as well.
> 
> I have submitted a case to Sun about it, will add details as that proceeds.

Hi

There are basically two possible issue that we are aware of
6772822,  where the root fs has insufficient space to hold the failsafe 
archive ( 181M ) the bootarchive 80M approx, and a rebuild of same when 
rebooting, leading to some possible different outcomes

if you see "seek failed" it indicates that new bootblk installed ok, but 
it couldn't rebuild on reboot,

There are also issues where if running svm on mpxio, the bootblk won't 
et installed, 6772083 or 6775167

Let us know the exact errror seen and if possible the exact output from 
patchadd 137137-09
Enda

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Miles Nordin
> "r" == Ross  <[EMAIL PROTECTED]> writes:

 r> style before I got half way through your post :) [...status
 r> problems...] could be a case of oversimplifying things.

yeah I was a bit inappropriate, but my frustration comes from the
(partly paranoid) imagining of how the idea ``we need to make it
simple'' might have spooled out through a series of design meetings to
a culturally-insidious mind-blowing condescention toward the sysadmin.

``simple'', to me, means that a 'status' tool does not read things off
disks, and does not gather a bunch of scraps to fabricate a pretty
(``simple''?) fantasy-world at invocation which is torn down again
when it exits.  The Linux status tools are pretty-printing wrappers
around 'cat /proc/$THING/status'.  That, is SIMPLE!  And, screaming
monkeys though they often are, the college kids writing Linux are
generally disciplined enough not to grab a bunch of locks and then go
to sleep for minutes when delivering things from /proc.  I love that.
The other, broken, idea of ``simple'' is what I come to Unix to avoid.

And yes, this is a religious argument.  Just because it spans decades
of experience and includes ideas of style doesn't mean it should be
dismissed as hocus-pocus.  And I don't like all these binary config
files either.  Not even Mac OS X is pulling that baloney any more.

 r> There's no denying the ease of admin is one of ZFS' strengths,

I deny it!  It is not simple to start up 'format' and 'zpool iostat'
and RoboCopy on another machine because you cannot trust the output of
the status command.  And getting visibility into something by starting
a bunch of commands in different windows and watching when which one
unfreezes is hilarious, not simple.

 r> the problems you've reported with resilvering.

I think we were watching this bug:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6675685

so that ought to be fixed in your test system but not in s10u6.  but
it might not be completely fixed yet:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6747698



pgpx4Yk6ZjF1M.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Bob Friesenhahn
On Tue, 2 Dec 2008, Carsten Aulbert wrote:
>
> Hmm, since I only started with Solaris this year, is there a way to
> identify a "slow" disk? In principle these should all be identical
> Hitachi Deathstar^WDeskstar drives and should only have the standard
> deviation during production.

Look at the output of 'iostat -xn 30' when the system is under load. 
Possibly ignore the initial output entry since that is an aggregate 
since the dawn of time.

You will need to know which disks are in each vdev.  Check to see if 
the asvc_t value for one of the disks is much more than the others in 
the same vdev. If a disk is acting as the bottleneck then it is likely 
that its asvc_t value is far greater than the others.

In order to get zfs's view of I/O at use

   zpool iostat -v poolname 30

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-12-02 Thread Vincent Fox
Reviving this thread.

We have a Solaris 10u4 system recently patched with 137137-09.
Unfortunately the patch was applied from multi-user mode, I wonder if this
may have been original posters problem as well?  Anyhow we are now stuck
with an unbootable system as well.

I have submitted a case to Sun about it, will add details as that proceeds.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Bob Friesenhahn
On Tue, 2 Dec 2008, Carsten Aulbert wrote:
>
> No I think a single disk would be much less performant, however I'm a
> bit disappointed by the overall performance of the boxes and just now we
> have users where they experience extremely slow performance.

If all of the disks in the vdev need to be written at once prior to 
the next write, then the write latency will surely be more than just 
one disk.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Bob Friesenhahn wrote:
> You may have one or more "slow" disk drives which slow down the whole
> vdev due to long wait times.  If you can identify those slow disk drives
> and replace them, then overall performance is likely to improve.
> 
> The problem is that under severe load, the vdev with the highest backlog
> will be used the least.  One or more slow disks in the vdev will slow
> down the whole vdev.  It takes only one slow disk to slow down the whole
> vdev.

Hmm, since I only started with Solaris this year, is there a way to
identify a "slow" disk? In principle these should all be identical
Hitachi Deathstar^WDeskstar drives and should only have the standard
deviation during production.
> 
> ZFS commits the writes to all involved disks in a raidz2 before
> proceeding with the next write.  With so many disks, you are asking for
> quite a lot of fortuitous luck in that everything must be working
> optimally.  Compounding the problem is that I understand that when the
> stripe width exceeds the number of segmented blocks from the data to be
> written (ZFS is only willing to dice to a certain minimum size), then
> only a subset of the disks will be used, wasting potiential I/O
> bandwidth.  Your stripes are too wide.
> 

Ah, ok, that's one of the first reasonable explanation (which I
understand) why large zpools might be bad. So far I was not able to
track that down and only found the standard "magic" rule not to exceed
10 drives - but our (synthetic) tests had not shown a significant
drawbacks. But I guess we might be bitten by it now.

>> (c) Would the use of several smaller vdev would help much? And which
>> layout would be a good compromise for getting space as well as
>> performance and reliability? 46 disks have so few prime factors
> 
> Yes, more vdevs should definitely help quite a lot for dealing with
> real-world muti-user loads.  One raidz/raidz2 vdev provides (at most)
> the IOPs of a single disk.
> 
> There is a point of diminishing returns and your layout has gone far
> beyond this limit.

Thanks for the insight, I guess I need to experiment with empty boxes to
get into a better state!

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Hi Miles,

Miles Nordin wrote:
>> "ca" == Carsten Aulbert <[EMAIL PROTECTED]> writes:
> 
> ca> (a) Why the first vdev does not get an equal share
> ca> of the load
> 
> I don't know.  but, if you don't add all the vdev's before writing
> anything, there's no magic to make them balance themselves out.  Stuff
> stays where it's written.  I'm guessing you did add them at the same
> time, and they still filled up unevenly?
> 

Yes, they are created all in one go (even on the same command line) and
only then are filled - either "naturally" over time or via zfs
send/receive (all on Sol10u5). So yes, it seems they fill up unevenly.

> 'zpool iostat' that you showed is the place I found to see how data is
> spread among vdev's.
> 
> ca>  (b) Why is a large raidz2 so bad? When I use a
> ca> standard Linux box with hardware raid6 over 16 disks I usually
> ca> get more bandwidth and at least about the same small file
> ca> performance
> 
> obviously there are all kinds of things going on but...the standard
> answer is, traditional RAID5/6 doesn't have to do full stripe I/O.
> ZFS is more like FreeBSD's RAID3: it gets around the NVRAMless-RAID5
> write hole by always writing a full stripe, which means all spindles
> seek together and you get the seek performance of 1 drive (per vdev).
> Linux RAID5/6 just gives up and accepts a write hole, AIUI, but
> because the stripes are much fatter than a filesystem block, you'll
> sometimes get the record you need by seeking a subset of the drives
> rather than all of them, which means the drives you didn't seek have
> the chance to fetch another record.
> 
> If you're saying you get worse performance than a single spindle, I'm
> not sure why.
>

No I think a single disk would be much less performant, however I'm a
bit disappointed by the overall performance of the boxes and just now we
have users where they experience extremely slow performance.

But already thanks for the inside

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Miles Nordin
> "ca" == Carsten Aulbert <[EMAIL PROTECTED]> writes:

ca> (a) Why the first vdev does not get an equal share
ca> of the load

I don't know.  but, if you don't add all the vdev's before writing
anything, there's no magic to make them balance themselves out.  Stuff
stays where it's written.  I'm guessing you did add them at the same
time, and they still filled up unevenly?

'zpool iostat' that you showed is the place I found to see how data is
spread among vdev's.

ca>  (b) Why is a large raidz2 so bad? When I use a
ca> standard Linux box with hardware raid6 over 16 disks I usually
ca> get more bandwidth and at least about the same small file
ca> performance

obviously there are all kinds of things going on but...the standard
answer is, traditional RAID5/6 doesn't have to do full stripe I/O.
ZFS is more like FreeBSD's RAID3: it gets around the NVRAMless-RAID5
write hole by always writing a full stripe, which means all spindles
seek together and you get the seek performance of 1 drive (per vdev).
Linux RAID5/6 just gives up and accepts a write hole, AIUI, but
because the stripes are much fatter than a filesystem block, you'll
sometimes get the record you need by seeking a subset of the drives
rather than all of them, which means the drives you didn't seek have
the chance to fetch another record.

If you're saying you get worse performance than a single spindle, I'm
not sure why.


pgpUTlo8kBKC2.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Bob Friesenhahn
On Tue, 2 Dec 2008, Carsten Aulbert wrote:
>
> Questions:
> (a) Why the first vdev does not get an equal share of the load

You may have one or more "slow" disk drives which slow down the whole 
vdev due to long wait times.  If you can identify those slow disk 
drives and replace them, then overall performance is likely to 
improve.

The problem is that under severe load, the vdev with the highest 
backlog will be used the least.  One or more slow disks in the vdev 
will slow down the whole vdev.  It takes only one slow disk to slow 
down the whole vdev.

> (b) Why is a large raidz2 so bad? When I use a standard Linux box with
> hardware raid6 over 16 disks I usually get more bandwidth and at least
> about the same small file performance

ZFS commits the writes to all involved disks in a raidz2 before 
proceeding with the next write.  With so many disks, you are asking 
for quite a lot of fortuitous luck in that everything must be working 
optimally.  Compounding the problem is that I understand that when the 
stripe width exceeds the number of segmented blocks from the data to 
be written (ZFS is only willing to dice to a certain minimum size), 
then only a subset of the disks will be used, wasting potiential I/O 
bandwidth.  Your stripes are too wide.

> (c) Would the use of several smaller vdev would help much? And which
> layout would be a good compromise for getting space as well as
> performance and reliability? 46 disks have so few prime factors

Yes, more vdevs should definitely help quite a lot for dealing with 
real-world muti-user loads.  One raidz/raidz2 vdev provides (at most) 
the IOPs of a single disk.

There is a point of diminishing returns and your layout has gone far 
beyond this limit.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Ross
Hi Miles,

It's probably a bad sign that although that post came through as anonymous in 
my e-mail, I recognised your style before I got half way through your post :)

I agree, the zpool status being out of date is weird, I'll dig out the bug 
number for that at some point as I'm sure I've mentioned it before.  It looks 
to me like there are two separate pieces of code that work out the status of 
the pool.  There's the stuff ZFS uses internally to run the pool, and then 
there's a completely separate piece that does the reporting to the end user.

I agree that it could be a case of oversimplifying things.  There's no denying 
the ease of admin is one of ZFS' strengths, but I think the whole zpool status 
thing needs looking at again.  Neither the way the command freezes, nor the out 
of date information make any sense to me.

And yes, I'm aware of the problems you've reported with resilvering.  That's on 
my list of things to test with this.  I've already done a quick test of running 
a scrub after the resilver (which appeared ok at first glance), and tomorrow 
I'll be testing the reboot status too.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] differences.. why?

2008-12-02 Thread Lori Alt

On 12/02/08 11:29, dick hoogendijk wrote:

Lori Alt wrote:
  

On 12/02/08 03:21, jan damborsky wrote:


Hi Dick,

I am redirecting your question to zfs-discuss
mailing list, where people are more knowledgeable
about this problem and your question could be
better answered.

Best regards,
Jan


dick hoogendijk wrote:

  

I have s10u6 installed on my server.
zfs list (partly):
NAMEUSED  AVAIL  REFER  MOUNTPOINT
rpool  88.8G   140G  27.5K  /rpool
rpool/ROOT 20.0G   140G18K  /rpool/ROOT
rpool/ROOT/s10BE2  20.0G   140G  7.78G  /

But just now, on a newly installed s10u6 system I got rpool/ROOT with a
mountpoint "legacy"




The mount point for //ROOT is supposed
to be "legacy" because that dataset should never be mounted.
It's just a "container" dataset to group all the BEs.



The drives were different. On the latter (legacy) system it was not
formatted (yet) (in VirtualBox). On my server I switched from UFS to
ZFS, so I first created a rpool and than did a luupgrade into it.
This could explain the mountpoint /rpool/ROOT but WHY the difference?
Why can't s10u6 install the same mountpoint on the new disk?
The server runs very well; is this "legacy" thing really needed?




When you created the rpool, did you also explicitly create the rpool/ROOT
datasets?   If you did create it and didn't set the mount point to
"legacy",
that explains why you ended up with your original configuration.  If
you didn't create the rpool/ROOT dataset yourself, and instead let
LiveUpgrade
create it automatically, and LiveUpgrade set the mountpoint to
/rpool/ROOT, then
that's a bug in LiveUpgrade (though a minor one, I think).



NO, I'm quite positive all I did was "zfs create rpool" and after that I
did a "lucreate -n zfsBE -p rpool" followed by "luupgrade -u -n zfsBE -s
/iso"

So, it must have been LU that "forgot" to set the mountpoint to legacy.
  

yes, we verified that and filed a bug against LU.

What is the correct syntax to correct this situation?

  

I'm not sure you really  need to, but you should be able
to do this:

zfs unmount rpool/ROOT
zfs set mountpoint=legacy rpool/ROOT

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Hi all,

We are running pretty large vdevs since the initial testing showed that
our setup was not too much off the optimum. However, under real world
load we do see quite some weird behaviour:

The system itself is a X4500 with 500 GB drives and right now the system
seems to be under heavy load, e.g. ls takes minutes to return on only a
few hundred entries, top shows 10% kernel, rest idle.

zpool ioststat -v atlashome 60 shows (not the first output):

  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   2.11T  18.8T  2.29K 36  71.7M   138K
  raidz2 466G  6.36T493 11  14.9M  34.1K
c0t0d0  -  - 48  5  1.81M  3.52K
c1t0d0  -  - 48  5  1.81M  3.46K
c4t0d0  -  - 48  5  1.81M  3.27K
c6t0d0  -  - 48  5  1.81M  3.40K
c7t0d0  -  - 47  5  1.81M  3.40K
c0t1d0  -  - 47  5  1.81M  3.20K
c1t1d0  -  - 47  6  1.81M  3.59K
c4t1d0  -  - 47  6  1.81M  3.53K
c5t1d0  -  - 47  5  1.81M  3.33K
c6t1d0  -  - 48  6  1.81M  3.67K
c7t1d0  -  - 48  6  1.81M  3.66K
c0t2d0  -  - 48  5  1.82M  3.42K
c1t2d0  -  - 48  6  1.81M  3.56K
c4t2d0  -  - 48  6  1.81M  3.54K
c5t2d0  -  - 48  5  1.81M  3.41K
  raidz2 732G  6.10T800 12  24.6M  52.3K
c6t2d0  -  -139  5  7.52M  4.54K
c7t2d0  -  -139  5  7.52M  4.81K
c0t3d0  -  -140  5  7.52M  4.98K
c1t3d0  -  -139  5  7.51M  4.47K
c4t3d0  -  -139  5  7.51M  4.82K
c5t3d0  -  -139  5  7.51M  4.99K
c6t3d0  -  -139  5  7.52M  4.44K
c7t3d0  -  -139  5  7.52M  4.78K
c0t4d0  -  -139  5  7.52M  4.97K
c1t4d0  -  -139  5  7.51M  4.60K
c4t4d0  -  -139  5  7.51M  4.86K
c6t4d0  -  -139  5  7.51M  4.99K
c7t4d0  -  -139  5  7.51M  4.52K
c0t5d0  -  -139  5  7.51M  4.78K
c1t5d0  -  -138  5  7.51M  4.94K
  raidz2 960G  6.31T  1.02K 12  32.2M  52.0K
c4t5d0  -  -178  5  9.29M  4.79K
c5t5d0  -  -178  5  9.28M  4.64K
c6t5d0  -  -179  5  9.29M  4.44K
c7t5d0  -  -178  4  9.26M  4.26K
c0t6d0  -  -178  5  9.28M  4.78K
c1t6d0  -  -178  5  9.20M  4.58K
c4t6d0  -  -178  5  9.26M  4.25K
c5t6d0  -  -177  4  9.21M  4.18K
c6t6d0  -  -178  5  9.29M  4.69K
c7t6d0  -  -177  5  9.26M  4.61K
c0t7d0  -  -177  5  9.29M  4.34K
c1t7d0  -  -177  5  9.24M  4.28K
c4t7d0  -  -177  5  9.29M  4.78K
c5t7d0  -  -177  5  9.27M  4.75K
c6t7d0  -  -177  5  9.29M  4.34K
c7t7d0  -  -177  5  9.27M  4.28K
--  -  -  -  -  -  -

Questions:
(a) Why the first vdev does not get an equal share of the load
(b) Why is a large raidz2 so bad? When I use a standard Linux box with
hardware raid6 over 16 disks I usually get more bandwidth and at least
about the same small file performance
(c) Would the use of several smaller vdev would help much? And which
layout would be a good compromise for getting space as well as
performance and reliability? 46 disks have so few prime factors

Thanks a lot

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A failed disk can bring down a machine?

2008-12-02 Thread Tim
On Tue, Dec 2, 2008 at 11:42 AM, Brian Hechinger <[EMAIL PROTECTED]> wrote:

> I was not in front of the machine, I had remote hands working with me, so I
> appologize in advance for any lack of detail I'm about to give.
>
> The server in question is running snv_81 booting ZFS Root using Tim's
> scripts to
> "convert" it over to ZFS Root.
>
> My server in colo stopped responding.  I had a screen session open and I
> could
> switch between screen windows and create new windows but I could not run
> any
> commands.  I also could not log into the box.
>
> The hands on person saw this on the console (transcribed from a video
> console):
>
> SYNCHRONIZE CACHE command failed (5)
> scsi: WARNING: /[EMAIL PROTECTED],0/pci1095,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd1)
>
> sd1 is one of two SATA disks connected to the machine via a SiL3124
> controller.
>
> I had the remote hands pull sd1 and reboot the machine.  It came right up
> and has
> been running fine since. Lacking its mirrored disks, however.
>
> Due to other issues I've had with this box (If you think you can get away
> with running
> ZFS on a 32-bit machine, you are mistaken) I'm looking to replace it
> anyway.  What
> concerns me is that a single disk having gone bad like that can take out
> the whole
> machine.  This is not what I would consider an ideal or acceptable setup
> for a machine
> that is in colo that doesn't have 24x7 onsite support.
>
> What was to blame for this disk failure causing my machine to become
> unresponsive?  Was
> it the SiL3124?  Is it something else?  Is this what I should expect from
> SATA?
>
> I ask all these questions as I want to make sure that if this is indeed
> connected to the
> use of a SATA controller, or the use of a specific SATA controller that I
> certainly avoid
> that with this next machine.
>
> I've got a very slim budget on this, and based on that I found what looks
> like a pretty
> nice little server that is in my budget.  It's an ASUS RS161-E2/PA2 which
> is based on the
> nForce Professional 2200, which from what I can tell is what the Ultra 40
> is based on, so
> I would expect it to pretty much just work.
>
> Will the nv_sata driver behave in a more sane fashion in a case like what
> I've just gone
> through?  If this is a shortcoming of SATA, does anyone have any
> recommendations on a not
> too expensive setup based on a SAS controller?
>
> As much as I would like this thing to do a great job in the performance
> arena, stability is
> definitely higher on the list of what's really important to me.
>
> Thanks,
>
> -brian
>



I believe the issue you're running into is the failmode you currently have
set.  Take a look at this:
http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A failed disk can bring down a machine?

2008-12-02 Thread Brian Hechinger
I was not in front of the machine, I had remote hands working with me, so I
appologize in advance for any lack of detail I'm about to give.

The server in question is running snv_81 booting ZFS Root using Tim's scripts to
"convert" it over to ZFS Root.

My server in colo stopped responding.  I had a screen session open and I could
switch between screen windows and create new windows but I could not run any
commands.  I also could not log into the box.

The hands on person saw this on the console (transcribed from a video console):

SYNCHRONIZE CACHE command failed (5)
scsi: WARNING: /[EMAIL PROTECTED],0/pci1095,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd1)

sd1 is one of two SATA disks connected to the machine via a SiL3124 controller.

I had the remote hands pull sd1 and reboot the machine.  It came right up and 
has
been running fine since. Lacking its mirrored disks, however.

Due to other issues I've had with this box (If you think you can get away with 
running
ZFS on a 32-bit machine, you are mistaken) I'm looking to replace it anyway.  
What
concerns me is that a single disk having gone bad like that can take out the 
whole
machine.  This is not what I would consider an ideal or acceptable setup for a 
machine
that is in colo that doesn't have 24x7 onsite support.

What was to blame for this disk failure causing my machine to become 
unresponsive?  Was
it the SiL3124?  Is it something else?  Is this what I should expect from SATA?

I ask all these questions as I want to make sure that if this is indeed 
connected to the
use of a SATA controller, or the use of a specific SATA controller that I 
certainly avoid
that with this next machine.

I've got a very slim budget on this, and based on that I found what looks like 
a pretty
nice little server that is in my budget.  It's an ASUS RS161-E2/PA2 which is 
based on the
nForce Professional 2200, which from what I can tell is what the Ultra 40 is 
based on, so
I would expect it to pretty much just work.

Will the nv_sata driver behave in a more sane fashion in a case like what I've 
just gone
through?  If this is a shortcoming of SATA, does anyone have any 
recommendations on a not
too expensive setup based on a SAS controller?

As much as I would like this thing to do a great job in the performance arena, 
stability is
definitely higher on the list of what's really important to me.

Thanks,

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Miles Nordin
> "rs" == Ross Smith <[EMAIL PROTECTED]> writes:

rs> 4. zpool status still reports out of date information.

I know people are going to skim this message and not hear this.
They'll say ``well of course zpool status says ONLINE while the pool
is hung.  ZFS is patiently waiting.  It doesn't know anything is
broken yet.''  but you are NOT saying it's out of date because it
doesn't say OFFLINE the instant you power down an iSCSI target.
You're saying:

rs> - After 3 minutes, the iSCSI drive goes offline.
rs> The pool carries on with the remaining two drives, CIFS
rs> carries on working, iostat carries on working.  "zpool status"
rs> however is still out of date.

rs> - zpool status eventually
rs> catches up, and reports that the drive has gone offline.

so, there is a ~30sec window when it's out of date.  When you say
``goes offline'' in the first bullet, you're saying ``ZFS must have
marked it offline internally, because the pool unfroze.''  but you
found that even after it ``goes offline'' 'zpool status' still 
reports it ONLINE.

The question is, what the hell is 'zpool status' reporting?  not the
status, apparently.  It's supposed to be a diagnosis tool.  Why should
you have to second-guess it and infer the position of ZFS's various
internal state machines through careful indirect observation, ``oops,
CIFS just came back,'' or ``oh sometihng must have changed because
zpool iostat isn't hanging any more''?  Why not have a tool that TELLS
you plainly what's going on?  'zpool status' isn't.

Is it trying to oversimplify things, to condescend to the sysadmin or
hide ZFS's rough edges?  Are there more states for devices that are
being compressed down to ONLINE OFFLINE DEGRADED FAULTED?  Is there
some tool in zdb or mdb that is like 'zpool status -simonsez'?  I
already know sometimes it'll report everything as ONLINE but refuse
'zpool offline ... ' with 'no valid replicas', so I think, yes
there are ``secret states'' for devices?  Or is it trying to do too
many things with one output format?

rs> 5. When iSCSI targets finally do come back online, ZFS is
rs> resilvering all of them (again, this rings a bell, Miles might
rs> have reported something similar).

my zpool status is so old it doesn't say ``xxkB resilvered'' so I've
no indication which devices are the source vs. target of the resilver.
What I found was, the auto-resilver isn't sufficient.  If you wait for
it to complete, then 'zpool scrub', you'll get thousands of CKSUM
errors on the dirty device, so the resilver isn't covering all the
dirtyness.  Also ZFS seems to forget about the need to resilver if you
shut down the machine, bring back the missing target, and boot---it
marks everything ONLINE and then resilvers as you hit the dirty data,
counting CKSUM errors.  This has likely been fixed between b71 and
b101.  It's easy to test: (a) shut down one iSCSI target, (b) write to
the pool, (c) bring the iSCSI target back, (d) wait for auto-resilver
to finish, (e) 'zpool scrub', (f) look for CKSUM errors.  I suspect
you're more worried about your own problems though---I'll try to
retest it soon.


pgpcvDMGKA1VP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] differences.. why?

2008-12-02 Thread dick hoogendijk

Lori Alt wrote:
> On 12/02/08 03:21, jan damborsky wrote:
>> Hi Dick,
>>
>> I am redirecting your question to zfs-discuss
>> mailing list, where people are more knowledgeable
>> about this problem and your question could be
>> better answered.
>>
>> Best regards,
>> Jan
>>
>>
>> dick hoogendijk wrote:
>>
>>> I have s10u6 installed on my server.
>>> zfs list (partly):
>>> NAMEUSED  AVAIL  REFER  MOUNTPOINT
>>> rpool  88.8G   140G  27.5K  /rpool
>>> rpool/ROOT 20.0G   140G18K  /rpool/ROOT
>>> rpool/ROOT/s10BE2  20.0G   140G  7.78G  /
>>>
>>> But just now, on a newly installed s10u6 system I got rpool/ROOT with a
>>> mountpoint "legacy"
>>>
>>>
> The mount point for //ROOT is supposed
> to be "legacy" because that dataset should never be mounted.
> It's just a "container" dataset to group all the BEs.
>
>>> The drives were different. On the latter (legacy) system it was not
>>> formatted (yet) (in VirtualBox). On my server I switched from UFS to
>>> ZFS, so I first created a rpool and than did a luupgrade into it.
>>> This could explain the mountpoint /rpool/ROOT but WHY the difference?
>>> Why can't s10u6 install the same mountpoint on the new disk?
>>> The server runs very well; is this "legacy" thing really needed?
>>>
>>>
> When you created the rpool, did you also explicitly create the rpool/ROOT
> datasets?   If you did create it and didn't set the mount point to
> "legacy",
> that explains why you ended up with your original configuration.  If
> you didn't create the rpool/ROOT dataset yourself, and instead let
> LiveUpgrade
> create it automatically, and LiveUpgrade set the mountpoint to
> /rpool/ROOT, then
> that's a bug in LiveUpgrade (though a minor one, I think).

NO, I'm quite positive all I did was "zfs create rpool" and after that I
did a "lucreate -n zfsBE -p rpool" followed by "luupgrade -u -n zfsBE -s
/iso"

So, it must have been LU that "forgot" to set the mountpoint to legacy.

What is the correct syntax to correct this situation?

-- 
Dick Hoogendijk -- PGP/GnuPG key: F86289CE
+http://nagual.nl/ | SunOS 10u6 10/08 ZFS+

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Brian Wilson


- Original Message -
From: Lori Alt <[EMAIL PROTECTED]>
Date: Tuesday, December 2, 2008 11:19 am
Subject: Re: [zfs-discuss] Separate /var
To: Gary Mills <[EMAIL PROTECTED]>
Cc: zfs-discuss@opensolaris.org

> On 12/02/08 09:00, Gary Mills wrote:
> > On Mon, Dec 01, 2008 at 04:45:16PM -0700, Lori Alt wrote:
> >   
> >>On 11/27/08 17:18, Gary Mills wrote:
> >> On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
> >> On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
> >> On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:
> >>
> >> I'm currently working with an organisation who
> >> want use ZFS for their  > full zones. Storage is SAN attached, and 
> they
> >> also want to create a  > separate /var for each zone, which causes 
> issues
> >> when the zone is  > installed. They believe that a separate /var is
> >> still good practice.
> >> If your mount options are different for /var and /, you will need
> >> a separate filesystem.  In our case, we use `setuid=off' and
> >> `devices=off' on /var for security reasons.  We do the same thing
> >> for home directories and /tmp .
> >>
> >> For zones?
> >>
> >> Sure, if you require different mount options in the zones.
> >>
> >>I looked into this and found that, using ufs,  you can indeed 
> set up
> >>the zone's /var directory as a separate file system.  I  don't know
> >>about
> >>how LiveUpgrade works with that configuration (I didn't try it).
> >>But I was at least able to get the zone to install and boot.
> >>But with zfs, I couldn't even get a zone with a separate /var
> >>dataset to install, let alone be manageable with LiveUpgrade.
> >>I configured the zone like so:
> >># zonecfg -z z4
> >>z4: No such zone configured
> >>Use 'create' to begin configuring a new zone.
> >>zonecfg:z4> create
> >>zonecfg:z4> set zonepath=/zfszones/z4
> >>zonecfg:z4> add fs
> >>zonecfg:z4:fs> set dir=/var
> >>zonecfg:z4:fs> set special=rpool/ROOT/s10x_u6wos_07b/zfszones/z4/var
> >>zonecfg:z4:fs> set type=zfs
> >>zonecfg:z4:fs> end
> >>zonecfg:z4> exit
> >>I then get this result from trying to install the zone:
> >>prancer# zoneadm -z z4 install
> >>Preparing to install zone .
> >>ERROR: No such file or directory: cannot mount 


I think you're running into the problem of defining the var as the filesystem 
that already exists under the zone root.  We had issues with that, so any time 
I've been doing filesystems, I don't push in zfs datasets, I create a zfs 
filesystem in the global zone and mount that directory into the zone with lofs. 
 For example, I've got a pool zdisk with a filesystem down the path -
zdisk/zones/zvars/(zonename)

which mounts itself to -
/zdisk/zones/zvars/(zonename)

It's a ZFS filesystem with quota and reservation setup, and I just do an lofs 
to it via these lines in the /etc/zones/(zonename).xml file -

  

  

I think that's the equivalent of the following zonecfg lines -

zonecfg:z4> add fs
zonecfg:z4:fs> set dir=/var
zonecfg:z4:fs> set special=/zdisk/zones/zvars/z4/var
zonecfg:z4:fs> set type=lofs
zonecfg:z4:fs> end

I think to put the zfs into the zone, you need to do an add dataset, instead of 
an add fs.  I tried that once and didn't like the results though completely.  
The dataset was controllable inside the zone (which is what I wanted at the 
time), but it wasn't controllably from the global zone anymore.  And I couldn't 
access it from the global zone easily to get the backup software to pick it up.

Doing it this way means you have to manage the zfs datasets from the global 
zone, but that's not really an issue here.

So, create the separate filesystems you want in the global zone (without 
stacking them under the zoneroot - separate directory somewhere), setup the zfs 
stuff you want, then lofs it into the local zone.  I've had that install 
successfully before.

Hope that's helpful in some way!



> >> 
> >
> > You might have to pre-create this filesystem. `special' may not be
> > needed at all.
> >   
> I did pre-create the file system.  Also, I tried omitting "special" and
> zonecfg complains. 
> 
> I think that there might need to be some changes
> to zonecfg and the zone installation code to get separate
> /var datasets in non-global zones to work.
> 
> Lori
> >   
> >>in non-global zone to install: the source block device or directory
> >> cannot be accessed
> >>ERROR: cannot setup zone  inherited and configured file systems
> >>ERROR: cannot setup zone  file systems inherited and configured
> >>from the global zone
> >>ERROR: cannot create zone boot environment 
> >>I don't fully  understand the failures here.  I suspect that 
> there are
> >>problems both in the zfs code and zones code.  It SHOULD work though.
> >>The fact that it doesn't seems like a bug.
> >>In the meantime, I guess we have to conclude that a separate /var
> >>in a non-global zone 

Re: [zfs-discuss] Problem importing degraded Pool

2008-12-02 Thread Philipp Haußleiter
It seems that my devices have several settings of pools :-(
zdb -l /dev/rdsk/c0t5d0

tells me


LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

version=6
name='tank'
state=0
txg=4
pool_guid=1230498626424814687
hostid=2180312168
hostname='sunny.local'
top_guid=7409377091667366359
guid=7409377091667366359
vdev_tree
type='disk'
id=1
guid=7409377091667366359
path='/dev/ad6'
devid='ad:S13UJDWQ726303'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750151532544

LABEL 3

version=6
name='tank'
state=0
txg=4
pool_guid=1230498626424814687
hostid=2180312168
hostname='sunny.local'
top_guid=7409377091667366359
guid=7409377091667366359
vdev_tree
type='disk'
id=1
guid=7409377091667366359
path='/dev/ad6'
devid='ad:S13UJDWQ726303'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750151532544

zdb -l /dev/rdsk/c0t5d0[b]s0[/b]

 tells me


LABEL 0

version=10
name='tank'
state=0
txg=72220
pool_guid=1717390511944489
hostname='sunny'
top_guid=2169144823532120681
guid=2169144823532120681
vdev_tree
type='disk'
id=1
guid=2169144823532120681
path='/dev/dsk/c0t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1002,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a'
whole_disk=1
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750142881792
is_log=0
DTL=93

LABEL 1

version=10
name='tank'
state=0
txg=72220
pool_guid=1717390511944489
hostname='sunny'
top_guid=2169144823532120681
guid=2169144823532120681
vdev_tree
type='disk'
id=1
guid=2169144823532120681
path='/dev/dsk/c0t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1002,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a'
whole_disk=1
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750142881792
is_log=0
DTL=93

LABEL 2

version=10
name='tank'
state=0
txg=72220
pool_guid=1717390511944489
hostname='sunny'
top_guid=2169144823532120681
guid=2169144823532120681
vdev_tree
type='disk'
id=1
guid=2169144823532120681
path='/dev/dsk/c0t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1002,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a'
whole_disk=1
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750142881792
is_log=0
DTL=93

LABEL 3

version=10
name='tank'
state=0
txg=72220
pool_guid=1717390511944489
hostname='sunny'
top_guid=2169144823532120681
guid=2169144823532120681
vdev_tree
type='disk'
id=1
guid=2169144823532120681
path='/dev/dsk/c0t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1002,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a'
whole_disk=1
metaslab_array=14
metaslab_shift=32
ashift=9
asize=750142881792
is_log=0
DTL=93


So the right pool data of pool # 1717390511944489 is in c0t2d0s0 and 
c0t5d0s0.
But somehow there is a second pool setting in c0t2d0 and c0t5d0.

Thank you to Richard Eling to point that out.
So is it possible to clear the invalid pool setting and just use the valid ones 
in the s0 Partitions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Bob Friesenhahn
On Tue, 2 Dec 2008, Toby Thain wrote:
>
> Even that is probably more frequent than necessary. I'm sure somebody
> has done the MTTDL math. IIRC, the big win is doing any scrubbing at
> all. The difference between scrubbing every 2 weeks and every 2
> months may be negligible. (IANAMathematician tho)

This surely depends on the type of hardware used.  If the disks are 
not true "enterprise" grade (e.g. ordinary SATA drives) then scrubbing 
more often is likely warranted since these are much more likely to 
exhibit user-visible decay over a period of time and the scrub will 
find (and correct) the decaying bits before it is too late.  The 
enterprise-class disks should not require scrubbing very often. 
Enterprise disks may be almost as likely to go entirely belly up as 
they are to produce a bad sector.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Mike Gerdts
On Tue, Dec 2, 2008 at 11:17 AM, Lori Alt <[EMAIL PROTECTED]> wrote:
> I did pre-create the file system.  Also, I tried omitting "special" and
> zonecfg complains.
>
> I think that there might need to be some changes
> to zonecfg and the zone installation code to get separate
> /var datasets in non-global zones to work.

You could probably do something like:

zfs create rpool/zones/$zone
zfs create rpool/zones/$zone/var

zonecfg -z $zone
add fs
  set dir=/var
  set special=/zones/$zone/var
  set type=lofs
  end
...

zoneadm -z $zone install

zonecfg -z $zone
remove fs dir=/var

zfs set mountpoint=/zones/$zone/root/var rpool/zones/$zone/var

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Lori Alt

On 12/02/08 09:00, Gary Mills wrote:

On Mon, Dec 01, 2008 at 04:45:16PM -0700, Lori Alt wrote:
  

   On 11/27/08 17:18, Gary Mills wrote:
On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:

I'm currently working with an organisation who
want use ZFS for their  > full zones. Storage is SAN attached, and they
also want to create a  > separate /var for each zone, which causes issues
when the zone is  > installed. They believe that a separate /var is
still good practice.
If your mount options are different for /var and /, you will need
a separate filesystem.  In our case, we use `setuid=off' and
`devices=off' on /var for security reasons.  We do the same thing
for home directories and /tmp .

For zones?

Sure, if you require different mount options in the zones.

   I looked into this and found that, using ufs,  you can indeed set up
   the zone's /var directory as a separate file system.  I  don't know
   about
   how LiveUpgrade works with that configuration (I didn't try it).
   But I was at least able to get the zone to install and boot.
   But with zfs, I couldn't even get a zone with a separate /var
   dataset to install, let alone be manageable with LiveUpgrade.
   I configured the zone like so:
   # zonecfg -z z4
   z4: No such zone configured
   Use 'create' to begin configuring a new zone.
   zonecfg:z4> create
   zonecfg:z4> set zonepath=/zfszones/z4
   zonecfg:z4> add fs
   zonecfg:z4:fs> set dir=/var
   zonecfg:z4:fs> set special=rpool/ROOT/s10x_u6wos_07b/zfszones/z4/var
   zonecfg:z4:fs> set type=zfs
   zonecfg:z4:fs> end
   zonecfg:z4> exit
   I then get this result from trying to install the zone:
   prancer# zoneadm -z z4 install
   Preparing to install zone .
   ERROR: No such file or directory: cannot mount 



You might have to pre-create this filesystem. `special' may not be
needed at all.
  

I did pre-create the file system.  Also, I tried omitting "special" and
zonecfg complains. 


I think that there might need to be some changes
to zonecfg and the zone installation code to get separate
/var datasets in non-global zones to work.

Lori
  

   in non-global zone to install: the source block device or directory
cannot be accessed
   ERROR: cannot setup zone  inherited and configured file systems
   ERROR: cannot setup zone  file systems inherited and configured
   from the global zone
   ERROR: cannot create zone boot environment 
   I don't fully  understand the failures here.  I suspect that there are
   problems both in the zfs code and zones code.  It SHOULD work though.
   The fact that it doesn't seems like a bug.
   In the meantime, I guess we have to conclude that a separate /var
   in a non-global zone is not supported on zfs.  A separate /var in
   the global zone is supported  however, even when the root is zfs.



I haven't tried ZFS zone roots myself, but I do have a few comments.
ZFS filesystems are cheap because they don't require separate disk
slices.  As well, they are attribute boundaries.  Those are necessary
or convenient in some case.

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] differences.. why?

2008-12-02 Thread Lori Alt
On 12/02/08 03:21, jan damborsky wrote:
> Hi Dick,
>
> I am redirecting your question to zfs-discuss
> mailing list, where people are more knowledgeable
> about this problem and your question could be
> better answered.
>
> Best regards,
> Jan
>
>
> dick hoogendijk wrote:
>   
>> I have s10u6 installed on my server.
>> zfs list (partly):
>> NAMEUSED  AVAIL  REFER  MOUNTPOINT
>> rpool  88.8G   140G  27.5K  /rpool
>> rpool/ROOT 20.0G   140G18K  /rpool/ROOT
>> rpool/ROOT/s10BE2  20.0G   140G  7.78G  /
>>
>> But just now, on a newly installed s10u6 system I got rpool/ROOT with a
>> mountpoint "legacy"
>>
>> 
The mount point for //ROOT is supposed
to be "legacy" because that dataset should never be mounted.
It's just a "container" dataset to group all the BEs.

>> The drives were different. On the latter (legacy) system it was not
>> formatted (yet) (in VirtualBox). On my server I switched from UFS to
>> ZFS, so I first created a rpool and than did a luupgrade into it.
>> This could explain the mountpoint /rpool/ROOT but WHY the difference?
>> Why can't s10u6 install the same mountpoint on the new disk?
>> The server runs very well; is this "legacy" thing really needed?
>>
>> 
When you created the rpool, did you also explicitly create the rpool/ROOT
datasets?   If you did create it and didn't set the mount point to "legacy",
that explains why you ended up with your original configuration.  If
you didn't create the rpool/ROOT dataset yourself, and instead let 
LiveUpgrade
create it automatically, and LiveUpgrade set the mountpoint to 
/rpool/ROOT, then
that's a bug in LiveUpgrade (though a minor one, I think).

Lori

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Ross Smith
Hi Richard,

Thanks, I'll give that a try.  I think I just had a kernel dump while
trying to boot this system back up though, I don't think it likes it
if the iscsi targets aren't available during boot.  Again, that rings
a bell, so I'll go see if that's another known bug.

Changing that setting on the fly didn't seem to help, if anything
things are worse this time around.  I changed the timeout to 15
seconds, but didn't restart any services:

# echo iscsi_rx_max_window/D | mdb -k
iscsi_rx_max_window:
iscsi_rx_max_window:180
# echo iscsi_rx_max_window/W0t15 | mdb -kw
iscsi_rx_max_window:0xb4=   0xf
# echo iscsi_rx_max_window/D | mdb -k
iscsi_rx_max_window:
iscsi_rx_max_window:15

After making those changes, and repeating the test, offlining an iscsi
volume hung all the commands running on the pool.  I had three ssh
sessions open, running the following:
# zpool iostats -v iscsipool 10 100
# format < /dev/null
# time zpool status

They hung for what felt a minute or so.
After that, the CIFS copy timed out.

After the CIFS copy timed out, I tried immediately restarting it.  It
took a few more seconds, but restarted no problem.  Within a few
seconds of that restarting, iostat recovered, and format returned it's
result too.

Around 30 seconds later, zpool status reported two drives, paused
again, then showed the status of the third:

# time zpool status
  pool: iscsipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Tue Dec  2 16:39:21 2008
config:

NAME   STATE READ WRITE CKSUM
iscsipool  ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c2t600144F04933FF6C5056967AC800d0  ONLINE   0
0 0  15K resilvered
c2t600144F04934FAB35056964D9500d0  ONLINE   0
0 0  15K resilvered
c2t600144F04934119E50569675FF00d0  ONLINE   0
200 0  24K resilvered

errors: No known data errors

real3m51.774s
user0m0.015s
sys 0m0.100s

Repeating that a few seconds later gives:

# time zpool status
  pool: iscsipool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver completed after 0h0m with 0 errors on Tue Dec  2 16:39:21 2008
config:

NAME   STATE READ WRITE CKSUM
iscsipool  DEGRADED 0 0 0
  raidz1   DEGRADED 0 0 0
c2t600144F04933FF6C5056967AC800d0  ONLINE   0
0 0  15K resilvered
c2t600144F04934FAB35056964D9500d0  ONLINE   0
0 0  15K resilvered
c2t600144F04934119E50569675FF00d0  UNAVAIL  3
5.80K 0  cannot open

errors: No known data errors

real0m0.272s
user0m0.029s
sys 0m0.169s




On Tue, Dec 2, 2008 at 3:58 PM, Richard Elling <[EMAIL PROTECTED]> wrote:

..

> iSCSI timeout is set to 180 seconds in the client code.  The only way
> to change is to recompile it, or use mdb.  Since you have this test rig
> setup, and I don't, do you want to experiment with this timeout?
> The variable is actually called "iscsi_rx_max_window" so if you do
>   echo iscsi_rx_max_window/D | mdb -k
> you should see "180"
> Change it using something like:
>   echo iscsi_rx_max_window/W0t30 | mdb -kw
> to set it to 30 seconds.
> -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Toby Thain

On 2-Dec-08, at 8:24 AM, Glaser, David wrote:

> Ok, thanks for all the responses. I'll probably do every other week  
> scrubs, as this is the backup data (so doesn't need to be checked  
> constantly).

Even that is probably more frequent than necessary. I'm sure somebody  
has done the MTTDL math. IIRC, the big win is doing any scrubbing at  
all. The difference between scrubbing every 2 weeks and every 2  
months may be negligible. (IANAMathematician tho)

--T

> I'm a little concerned about the time involved to do 33TB (after  
> the 48TB has been RAIDed fully) when it is fully populated with  
> filesystems and snapshots, but I'll keep an eye on it.
>
> Thanks all.
>
> Dave
>
>
> -Original Message-
> From: Paul Weaver [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 02, 2008 8:11 AM
> To: Glaser, David; zfs-discuss@opensolaris.org
> Subject: RE: [zfs-discuss] How often to scrub?
>
>> I have a Thumper (ok, actually 3) with each having one large pool,
> multiple
>> filesystems and many snapshots. They are holding rsync copies of
> multiple
>> clients, being synced every night (using snapshots to keep
> 'incremental'
>> backups).
>>
>> I'm wondering how often (if ever) I should do scrubs of the pools, or
> if
>> the internal zfs integrity is enough that I don't need to do manual
> scrubs
>> of the pool? I read through a number of tutorials online as well as
> the zfs
>> wiki entry, but I didn't see anything very pertinent. Scrubs are I/O
>> intensive, but is the Pool able to be used normally during a scrub? I
>> think the answer is yes, but some confirmation helps me sleep at
> night.
>
> Scrubs are the lowest priority, so I understand it should  
> theoretically
> work fine.
>
> We've got two 48TB thumpers, with a nightly rsync from the main to the
> reserve. I'm currently running a scrub every Friday at 23:02, which  
> last
> week took 5h15 to scrub the 7TB of used data (about 5TB of real,  
> 2TB of
> snapshots) on the single pool. That's about 380MBytes/second.
>
>
> --
>
> "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0"
>
> Paul Weaver
> Systems Development Engineer
> News Production Facilities, BBC News
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Will Murnane
On Tue, Dec 2, 2008 at 10:15, Paul Weaver <[EMAIL PROTECTED]> wrote:
> So you've got a zpool across 46 (48?) of the disks?
>
> When I was looking into our thumpers everyone seemed to think a raidz
> over
> more than 10 disks was a hideous idea.
A vdev that size is bad, a pool that size composed of multiple vdevs
is fine.  Raidz2 is always recommended over raidz.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Gary Mills
On Mon, Dec 01, 2008 at 04:45:16PM -0700, Lori Alt wrote:
>On 11/27/08 17:18, Gary Mills wrote:
> On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
> On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
> On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:
> 
> I'm currently working with an organisation who
> want use ZFS for their  > full zones. Storage is SAN attached, and they
> also want to create a  > separate /var for each zone, which causes issues
> when the zone is  > installed. They believe that a separate /var is
> still good practice.
> If your mount options are different for /var and /, you will need
> a separate filesystem.  In our case, we use `setuid=off' and
> `devices=off' on /var for security reasons.  We do the same thing
> for home directories and /tmp .
> 
> For zones?
> 
> Sure, if you require different mount options in the zones.
> 
>I looked into this and found that, using ufs,  you can indeed set up
>the zone's /var directory as a separate file system.  I  don't know
>about
>how LiveUpgrade works with that configuration (I didn't try it).
>But I was at least able to get the zone to install and boot.
>But with zfs, I couldn't even get a zone with a separate /var
>dataset to install, let alone be manageable with LiveUpgrade.
>I configured the zone like so:
># zonecfg -z z4
>z4: No such zone configured
>Use 'create' to begin configuring a new zone.
>zonecfg:z4> create
>zonecfg:z4> set zonepath=/zfszones/z4
>zonecfg:z4> add fs
>zonecfg:z4:fs> set dir=/var
>zonecfg:z4:fs> set special=rpool/ROOT/s10x_u6wos_07b/zfszones/z4/var
>zonecfg:z4:fs> set type=zfs
>zonecfg:z4:fs> end
>zonecfg:z4> exit
>I then get this result from trying to install the zone:
>prancer# zoneadm -z z4 install
>Preparing to install zone .
>ERROR: No such file or directory: cannot mount 

You might have to pre-create this filesystem. `special' may not be
needed at all.

>in non-global zone to install: the source block device or directory
> cannot be accessed
>ERROR: cannot setup zone  inherited and configured file systems
>ERROR: cannot setup zone  file systems inherited and configured
>from the global zone
>ERROR: cannot create zone boot environment 
>I don't fully  understand the failures here.  I suspect that there are
>problems both in the zfs code and zones code.  It SHOULD work though.
>The fact that it doesn't seems like a bug.
>In the meantime, I guess we have to conclude that a separate /var
>in a non-global zone is not supported on zfs.  A separate /var in
>the global zone is supported  however, even when the root is zfs.

I haven't tried ZFS zone roots myself, but I do have a few comments.
ZFS filesystems are cheap because they don't require separate disk
slices.  As well, they are attribute boundaries.  Those are necessary
or convenient in some case.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem importing degraded Pool

2008-12-02 Thread Philipp Haußleiter
thx for your suggestions couper88, but this did not help :-/.

I tried the lastes live-cd of 2008.11
and got new information:

a zpool import shows me now:


[EMAIL PROTECTED]:~# zpool import
  pool: tank
id: 1717390511944489
 state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

tankUNAVAIL  insufficient replicas
  c3t5d0ONLINE

  pool: tank
id: 1230498626424814687
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankFAULTED  corrupted data
  c3t5d0p0  FAULTED  corrupted data
  c3t2d0p0  ONLINE


So i think the second pool is the right one...  BUT i really do not know 
how to import it.
I tried both
[EMAIL PROTECTED]:~# zpool import -f tank 
cannot import 'tank': more than one matching pool
import by numeric ID instead

[EMAIL PROTECTED]:/dev/rdsk# zpool import -f 1230498626424814687
cannot import 'tank': one or more devices is currently unavailable

S i got a little bit more hope now...
But there is still the Problem, that i cannot import that specific pool :-/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Paul Weaver
So you've got a zpool across 46 (48?) of the disks?

When I was looking into our thumpers everyone seemed to think a raidz
over 
more than 10 disks was a hideous idea.

--
Paul Weaver 
Systems Development Engineer
News Production Facilities, BBC News
Work:   020 822 58109
Room 1244  Television Centre,
Wood Lane, London, W12 7RJ

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Glaser, David
> Sent: 02 December 2008 13:24
> To: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] How often to scrub?
> 
> Ok, thanks for all the responses. I'll probably do every 
> other week scrubs, as this is the backup data (so doesn't 
> need to be checked constantly). I'm a little concerned about 
> the time involved to do 33TB (after the 48TB has been RAIDed 
> fully) when it is fully populated with filesystems and 
> snapshots, but I'll keep an eye on it. 
> 
> Thanks all.
> 
> Dave
> 
> 
> -Original Message-
> From: Paul Weaver [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 02, 2008 8:11 AM
> To: Glaser, David; zfs-discuss@opensolaris.org
> Subject: RE: [zfs-discuss] How often to scrub?
> 
> > I have a Thumper (ok, actually 3) with each having one large pool,
> multiple 
> > filesystems and many snapshots. They are holding rsync copies of
> multiple 
> > clients, being synced every night (using snapshots to keep
> 'incremental' 
> > backups). 
> > 
> > I'm wondering how often (if ever) I should do scrubs of the 
> pools, or
> if 
> > the internal zfs integrity is enough that I don't need to do manual
> scrubs 
> > of the pool? I read through a number of tutorials online as well as
> the zfs 
> > wiki entry, but I didn't see anything very pertinent. 
> Scrubs are I/O 
> > intensive, but is the Pool able to be used normally during 
> a scrub? I 
> > think the answer is yes, but some confirmation helps me sleep at
> night. 
> 
> Scrubs are the lowest priority, so I understand it should 
> theoretically work fine.
> 
> We've got two 48TB thumpers, with a nightly rsync from the 
> main to the reserve. I'm currently running a scrub every 
> Friday at 23:02, which last week took 5h15 to scrub the 7TB 
> of used data (about 5TB of real, 2TB of
> snapshots) on the single pool. That's about 380MBytes/second.
> 
> 
> --
> 
> "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0"
> 
> Paul Weaver
> Systems Development Engineer
> News Production Facilities, BBC News
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RE : rsync using 100% of a cp u

2008-12-02 Thread William D. Hathaway
How are the two sides different?  If you run something like 'openssl md5sum' on 
both sides is it much faster on one side?

Does one machine have a lot more memory/ARC and allow it to skip the physical 
reads?  Is the dataset compressed on one side?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Glaser, David
Ok, thanks for all the responses. I'll probably do every other week scrubs, as 
this is the backup data (so doesn't need to be checked constantly). I'm a 
little concerned about the time involved to do 33TB (after the 48TB has been 
RAIDed fully) when it is fully populated with filesystems and snapshots, but 
I'll keep an eye on it. 

Thanks all.

Dave


-Original Message-
From: Paul Weaver [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2008 8:11 AM
To: Glaser, David; zfs-discuss@opensolaris.org
Subject: RE: [zfs-discuss] How often to scrub?

> I have a Thumper (ok, actually 3) with each having one large pool,
multiple 
> filesystems and many snapshots. They are holding rsync copies of
multiple 
> clients, being synced every night (using snapshots to keep
'incremental' 
> backups). 
> 
> I'm wondering how often (if ever) I should do scrubs of the pools, or
if 
> the internal zfs integrity is enough that I don't need to do manual
scrubs 
> of the pool? I read through a number of tutorials online as well as
the zfs 
> wiki entry, but I didn't see anything very pertinent. Scrubs are I/O 
> intensive, but is the Pool able to be used normally during a scrub? I 
> think the answer is yes, but some confirmation helps me sleep at
night. 

Scrubs are the lowest priority, so I understand it should theoretically
work fine.

We've got two 48TB thumpers, with a nightly rsync from the main to the
reserve. I'm currently running a scrub every Friday at 23:02, which last
week took 5h15 to scrub the 7TB of used data (about 5TB of real, 2TB of
snapshots) on the single pool. That's about 380MBytes/second.


--

"09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0"

Paul Weaver
Systems Development Engineer
News Production Facilities, BBC News
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How often to scrub?

2008-12-02 Thread Paul Weaver
> I have a Thumper (ok, actually 3) with each having one large pool,
multiple 
> filesystems and many snapshots. They are holding rsync copies of
multiple 
> clients, being synced every night (using snapshots to keep
'incremental' 
> backups). 
> 
> I'm wondering how often (if ever) I should do scrubs of the pools, or
if 
> the internal zfs integrity is enough that I don't need to do manual
scrubs 
> of the pool? I read through a number of tutorials online as well as
the zfs 
> wiki entry, but I didn't see anything very pertinent. Scrubs are I/O 
> intensive, but is the Pool able to be used normally during a scrub? I 
> think the answer is yes, but some confirmation helps me sleep at
night. 

Scrubs are the lowest priority, so I understand it should theoretically
work fine.

We've got two 48TB thumpers, with a nightly rsync from the main to the
reserve. I'm currently running a scrub every Friday at 23:02, which last
week took 5h15 to scrub the 7TB of used data (about 5TB of real, 2TB of
snapshots) on the single pool. That's about 380MBytes/second.


--

"09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0"

Paul Weaver
Systems Development Engineer
News Production Facilities, BBC News
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RE : rsync using 100% of a cp u

2008-12-02 Thread Francois Dion
>>"Francois Dion" wrote:
>> Source is local to rsync, copying from a zfs file system,  
>> destination is remote over a dsl connection. Takes forever to just  
>> go through the unchanged files. Going the other way is not a  
>> problem, it takes a fraction of the time. Anybody seen that?  
>> Suggestions?
>De: Blake Irvin [mailto:[EMAIL PROTECTED]
>Upstream when using DSL is much slower than downstream?

No, that's not the problem. I know ADSL is assymetrical. When there is an 
actual data transfer going on, the cpu drops to 0.2%. It's only when rsync is 
doing its thing (reading, not writing) locally that it pegs the cpu. We are 
talking 15 minutes in one direction while in the other it looks like I'll pass 
the 24 hours mark before the rsync is complete. And there were less than 100MB 
added on each side.

BTW, the only other process I've seen that pegs the cpu solid for as long as it 
runs on my v480 is when I downloaded Belenix through a python script 
(btdownloadheadless).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hardware Raid Vs ZFS implementation on Sun X4150/X4450

2008-12-02 Thread Vikash Gupta
Hi,

 

Has anyone implemented the Hardware RAID 1/5 on Sun X4150/X4450 class of
servers . 

Also any comparison between ZFS Vs H/W Raid ?

 

I would like to know the experience (good/bad) and the pros/cons?

 

Regards,

Vikash

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Ross
Incidentally, while I've reported this again as a RFE, I still haven't seen a 
CR number for this.  Could somebody from Sun check if it's been filed please.

thanks,

Ross
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-12-02 Thread Darren J Moffat
t. johnson wrote:
>>> One would expect so, yes. But the usefulness of this is limited to the 
>>> cases where the entire working set will fit into an SSD cache.
>>>
>> Not entirely out of the question. SSDs can be purchased today
>> with more than 500 GBytes in a 2.5" form factor. One or more of
>> these would make a dandy L2ARC.
>> http://www.stecinc.com/product/mach8mlc.php
> 
> 
> Speaking of which.. what's the current limit on L2ARC size? Gathering tidbits 
> here and there (7000 storage line config limits, FAST talk given by Bill 
> Moore) there are indications that L2ARC can only be ~500GB?

There is no limits on the size of the L2ARC that I could fine 
implemented in the source code.

However every buffer that is cached on an L2ARC device needs an ARC 
header in the in memory ARC that points to it.  So in practical terms 
there will be a limit on the size of an L2ARC based on the size of 
physical ram.

For example a machine with 512 MegaByte RAM and a 500GByte SSD L2ARC is 
probably pretty silly.

I'll leave it as an exercise to the reader to work out how much core 
memory is needed based on the sizes of arc_buf_t (0x30) and 
arc_buf_hdr_t (0xf8).

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem importing degraded Pool

2008-12-02 Thread Brian Couper
Hi,

Attach both original drives to the system, the faulty one may only have had a 
few check sum errors. 

zpool status -vshould hopefully show your data pool.  Provided you have not 
started to replaced the faulty drive yet.  If it don't see the pool, zpool 
export then zpool  import and hope for the best

If you get back to original failed state with your pool degraded but readable. 
It can be easily fixed. most of the time.

Do a  zpool status -v   <- mind the -v

Whats it saying about your pool? I suspect the faulty drive has check sum 
errors and has been off-lined.

power down the system and add the spare 3rd drive to the system so you have all 
3 drives connected. DO NOT MOVE the original drives to different connections in 
the system that just going to cause more trouble.

While your inside the system check all the connection to the hard drives.

power up the system

Look up the ZFS commands.  Read and understand what your about to do.

you need to force the failed drive online
#zpool online pool device
do a zpool clear to clear the error log on the faulty pool
#zpool clear pool

now you have 2 choices here, back up your critical data to the new 3rd drive or 
replace the faulty drive.

zpool replace [-f] pool device [new_device]

Now zfs is almost certainly going to complain like hell about the faulty pool.
during the copy / replace operation. 

To be blunt your data is either readable or its not.  Run zpool clear and force 
online the faulty drive.  Every time it gets put offline, this may be  several 
times!
Zfs will tell you exactly what files have been lost, if any. The process could 
take several hours. Do a zpool scrub once its finished. Then back up your 
data

use zpool status -v to monitor progress.

If you don't get a lot of errors from the faulty drive. You could try  a low 
level format, to fix the drive. After you have got the data off it ;)  

one final word, a striped zpool with copy's=2 is about as much use, as a 
chocolate fire guard when it comes to protecting data.  Use 3+ drives and raidz 
its far better.

Am no expert, been using zfs for 7months. When i fist started using it, ZFS 
found 4 faulty drives in my setup. And other operating systems said they were 
good drives!!! So i have used ZFS to its full recovery potential!!!

Brian,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Availability: ZFS needs to handle disk removal / driver failure better

2008-12-02 Thread Ross Smith
Hey folks,

I've just followed up on this, testing iSCSI with a raided pool, and
it still appears to be struggling when a device goes offline.

>>> I don't see how this could work except for mirrored pools.  Would that
>>> carry enough market to be worthwhile?
>>> -- richard
>>>
>>
>> I have to admit, I've not tested this with a raided pool, but since
>> all ZFS commands hung when my iSCSI device went offline, I assumed
>> that you would get the same effect of the pool hanging if a raid-z2
>> pool is waiting for a response from a device.  Mirrored pools do work
>> particularly well with this since it gives you the potential to have
>> remote mirrors of your data, but if you had a raid-z2 pool, you still
>> wouldn't want that hanging if a single device failed.
>>
>
> zpool commands hanging is CR6667208, and has been fixed in b100.
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667208
>
>> I will go and test the raid scenario though on a current build, just to be
>> sure.
>>
>
> Please.
> -- richard


I've just created a pool using three snv_103 iscsi Targets, with a
fourth install of snv_103 collating those targets into a raidz pool,
and sharing that out over CIFS.

To test the server, while transferring files from a windows
workstation, I powered down one of the three iSCSI targets.  It took a
few minutes to shutdown, but once that happened the windows copy
halted with the error:
"The specified network name is no longer available."

At this point, the zfs admin tools still work fine (which is a huge
improvement, well done!), but zpool status still reports that all
three devices are online.

A minute later, I can open the share again, and start another copy.

Thirty seconds after that, zpool status finally reports that the iscsi
device is offline.

So it looks like we have the same problems with that 3 minute delay,
with zpool status reporting wrong information, and the CIFS service
having problems tool.

At this point I restarted the iSCSI target, but had problems bringing
it back online.  It appears there's a bug in the initiator, but it's
easily worked around:
http://www.opensolaris.org/jive/thread.jspa?messageID=312981񌚕

What was great was that as soon as the iSCSI initiator reconnected,
ZFS started resilvering.

What might not be so great is the fact that all three devices are
showing that they've been resilvered:

# zpool status
  pool: iscsipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h2m with 0 errors on Tue Dec  2 11:04:10 2008
config:

NAME   STATE READ WRITE CKSUM
iscsipool  ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c2t600144F04933FF6C5056967AC800d0  ONLINE   0
0 0  179K resilvered
c2t600144F04934FAB35056964D9500d0  ONLINE   5
9.88K 0  311M resilvered
c2t600144F04934119E50569675FF00d0  ONLINE   0
0 0  179K resilvered

errors: No known data errors

It's proving a little hard to know exactly what's happening when,
since I've only got a few seconds to log times, and there are delays
with each step.  However, I ran another test using robocopy and was
able to observe the behaviour a little more closely:

Test 2:  Using robocopy for the transfer, and iostat plus zpool status
on the server

10:46:30 - iSCSI server shutdown started
10:52:20 - all drives still online according to zpool status
10:53:30 - robocopy error - "The specified network name is no longer available"
 - zpool status shows all three drives as online
 - zpool iostat appears to have hung, taking much longer than the 30s
specified to return a result
 - robocopy is now retrying the file, but appears to have hung
10:54:30 - robocopy, CIFS and iostat all start working again, pretty
much simultaneously
 - zpool status now shows the drive as offline

I could probably do with using DTrace to get a better look at this,
but I haven't learnt that yet.  My guess as to what's happening would
be:

- iSCSI target goes offline
- ZFS will not be notified for 3 minutes, but I/O to that device is
essentially hung
- CIFS times out (I suspect this is on the client side with around a
30s timeout, but I can't find the timeout documented anywhere).
- zpool iostat is now waiting, I may be wrong but this doesn't appear
to have benefited from the changes to zpool status
- After 3 minutes, the iSCSI drive goes offline.  The pool carries on
with the remaining two drives, CIFS carries on working, iostat carries
on working.  "zpool status" however is still out of date.
- zpool status eventually catches u

Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-12-02 Thread t. johnson
>>
>> One would expect so, yes. But the usefulness of this is limited to the cases 
>> where the entire working set will fit into an SSD cache.
>>
>
> Not entirely out of the question. SSDs can be purchased today
> with more than 500 GBytes in a 2.5" form factor. One or more of
> these would make a dandy L2ARC.
> http://www.stecinc.com/product/mach8mlc.php


Speaking of which.. what's the current limit on L2ARC size? Gathering tidbits 
here and there (7000 storage line config limits, FAST talk given by Bill Moore) 
there are indications that L2ARC can only be ~500GB?

Is this the case? If so, is that a raw size limitation or a number of devices 
used to form the L2ARC limitation or something else? I'm sure some of us can 
come with examples where we really would like to use much more than a 500GB 
L2ARC :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs_nocacheflush, nvram, and root pools

2008-12-02 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

hi,

i have a system connected to an external DAS (SCSI) array, using ZFS.  the
array has an nvram write cache, but it honours SCSI cache flush commands by
flushing the nvram to disk.  the array has no way to disable this behaviour.  a
well-known behaviour of ZFS is that it often issues cache flush commands to
storage in order to ensure data integrity; while this is important with normal
disks, it's useless for nvram write caches, and it effectively disables the
cache.

so far, i've worked around this by setting zfs_nocacheflush, as described at
[1], which works fine.  but now i want to upgrade this system to Solaris 10
Update 6, and use a ZFS root pool on its internal SCSI disks (previously, the
root was UFS).  the problem is that zfS_nocacheflush applies to all pools,
which will include the root pool.

my understanding of ZFS is that when run on a root pool, which uses slices
(instead of whole disks), ZFS won't enable the write cache itself.  i also
didn't enable the write cache manually.  so, it _should_ be safe to use
zfs_nocacheflush, because there is no caching on the root pool.

am i right, or could i encounter problems here?

(the system is an NFS server, which means lots of synchronous writes (and
therefore ZFS cache flushes), so i *really* want the performance benefit from
using the nvram write cache.)

- river.

[1] 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes
-BEGIN PGP SIGNATURE-

iD8DBQFJNRJVIXd7fCuc5vIRAgDlAJ0boVf5zmvkRySeIHVumsKm3VSVhACffyOK
POEMyzG8U2yQYeZr01uJ74Q=
=9eBp
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] differences.. why?

2008-12-02 Thread jan damborsky
Hi Dick,

I am redirecting your question to zfs-discuss
mailing list, where people are more knowledgeable
about this problem and your question could be
better answered.

Best regards,
Jan


dick hoogendijk wrote:
> I have s10u6 installed on my server.
> zfs list (partly):
> NAMEUSED  AVAIL  REFER  MOUNTPOINT
> rpool  88.8G   140G  27.5K  /rpool
> rpool/ROOT 20.0G   140G18K  /rpool/ROOT
> rpool/ROOT/s10BE2  20.0G   140G  7.78G  /
>
> But just now, on a newly installed s10u6 system I got rpool/ROOT with a
> mountpoint "legacy"
>
> The drives were different. On the latter (legacy) system it was not
> formatted (yet) (in VirtualBox). On my server I switched from UFS to
> ZFS, so I first created a rpool and than did a luupgrade into it.
> This could explain the mountpoint /rpool/ROOT but WHY the difference?
> Why can't s10u6 install the same mountpoint on the new disk?
> The server runs very well; is this "legacy" thing really needed?
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss