[zfs-discuss] Re: zfs lost function

2007-05-09 Thread Simon

Brothers,

I've fixed the issue by reconfigure the system device tree as:

# devfsadm -Cv

Some new devices were added,and then zfs works fine.

Thanks for your kind attention.

Rgds,
Simon

On 5/10/07, Simon <[EMAIL PROTECTED]> wrote:

Gurus,

My fresh installed Solaris 10 U3 can't bootup normally on T2000
server(System Firmware 6.4.4 ),the OS can only enter into the
single-user mode,as one critical service fails to start:

# uname -a
SunOS t2000 5.10 Generic_118833-33 sun4v sparc SUNW,Sun-Fire-T200
(it's not patched,just finished the installation)

# svcs -vx
svc:/system/device/local:default (Standard Solaris device configuration.)
 State: maintenance since Thu May 10 12:04:36 2007
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
   See: http://sun.com/msg/SMF-8000-KS
   See: /etc/svc/volatile/system-device-local:default.log
Impact: 12 dependent services are not running:
svc:/system/filesystem/minimal:default
svc:/system/manifest-import:default
svc:/milestone/single-user:default
svc:/system/filesystem/local:default
svc:/milestone/multi-user:default
svc:/milestone/multi-user-server:default
svc:/network/inetd-upgrade:default
svc:/application/font/fc-cache:default
svc:/system/console-login:default
svc:/network/rpc/bind:default
svc:/milestone/devices:default
svc:/network/initial:default

# svcprop svc:/system/device/local:default |grep 'start\/exec'
start/exec astring /lib/svc/method/devices-local

Then I try to run the startup script with debug mode,the output as:
+ /sbin/zonename
+ [ global != global ]
+ . /lib/svc/share/smf_include.sh
SMF_EXIT_OK=0
SMF_EXIT_ERR_FATAL=95
SMF_EXIT_ERR_CONFIG=96
SMF_EXIT_MON_DEGRADE=97
SMF_EXIT_MON_OFFLINE=98
SMF_EXIT_ERR_NOSMF=99
SMF_EXIT_ERR_PERM=100
+ svcprop -q -p system/reconfigure system/svc/restarter:default
+ [ 1 -eq 0 ]
+ /usr/sbin/prtconf -F
fbdev=
+ [ 1 -eq 0 ]
+ [ -x /usr/sbin/zfs ]
+ /usr/sbin/zfs volinit
internal error: Bad file number
Abort - core dumped
+ exit 95

It's obvious that the script exit with non-zero is due to the code
'zfs volinit':

# zfs volinit
internal error: Bad file number
Abort - core dumped

# zpool status
internal error: Bad file number
Abort - core dumped
#
# zfs list
internal error: Bad file number
Abort - core dumped
#

seems that the  'zfs' not function well,How to fix it?

TIA.

Rgds,
Simon


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilvering speed?

2007-05-09 Thread Toby Thain


On 9-May-07, at 3:44 PM, Bakul Shah wrote:


Robert Milkowski wrote:

Hello Mario,

Wednesday, May 9, 2007, 5:56:18 PM, you wrote:

MG> I've read that it's supposed to go at full speed, i.e. as  
fast as
MG> possible. I'm doing a disk replace and what zpool reports  
kind of
MG> surprises me. The resilver goes on at 1.6MB/s. Did  
resilvering get
MG> throttled at some point between the builds, or is my ATA  
controller hav

ing bigger issues?


Lot of small files perhaps? What kind of protection have you used?


Good question.  Remember that resilvering is done in time order  
and from

the top-level metadata down, not by sequentially blasting bits.  Jeff
Bonwick describes this as top-down resilvering.
http://blogs.sun.com/bonwick/entry/smokin_mirrors

 From a MTTR and performance perspective this means that ZFS  
recovery time
is a function of the amount of space used, where it is located  
(!), and the
validity of the surviving or regenerated data.  The big win is the  
amount of

space used, as most file systems are not full.
  -- richard


It seems to me that once you copy meta data, you can indeed
copy all live data sequentially.


I don't see this, given the top down strategy. For instance, if I  
understand the transactional update process, you can't commit the  
metadata until the data is in place.


Can you explain in more detail your reasoning?


  Given that a vast majority
of disk blocks in use will typically contain data, this is a
winning strategy from a performance point of view and still
allows you to retrieve a fair bit of data in case of a second
disk failure (checksumming will catch a case where good
metadata points to as yet uncopied data block).  If amount of
live data is > 50% of disk space you may as well do a disk
copy, perhaps skipping over already copied meta data.

Not only that, you can even start using the disk being
resilvered right away for writes,  The new write will be
either to a) an already copied block


How can that be, under a COW régime?

--Toby


or b) as yet uncopied
block.  In case a) there is nothing more to do.  In case b)
the copied-from-block will have the new data so in both cases
the right thing happens.  Any potential window between
reading a copied-from block and writing to copied-to block
can be closed with careful coding/locking.

If a second disk fails during copying, the current strategy
doesn't buy you much in most any case.  You really don't want
to go through a zillion files looking for survivors.  If you
have a backup, you will restore from that rather than look
through the debris.  Not to mention you have made the window
of a potentially catastrophic failure much larger if
resilvering is significantly slower.

Comments?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ZFS with raidz

2007-05-09 Thread Tom Haynes
> 
> Doug has been doing some performance optimization to
> the sharemgr to allow faster boot up in loading
> 

Doug has blogged about his performance numbers here: 
http://blogs.sun.com/dougm/entry/recent_performance_improvement_in_zfs
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs lost function

2007-05-09 Thread Simon

Gurus,

My fresh installed Solaris 10 U3 can't bootup normally on T2000
server(System Firmware 6.4.4 ),the OS can only enter into the
single-user mode,as one critical service fails to start:

# uname -a
SunOS t2000 5.10 Generic_118833-33 sun4v sparc SUNW,Sun-Fire-T200
(it's not patched,just finished the installation)

# svcs -vx
svc:/system/device/local:default (Standard Solaris device configuration.)
State: maintenance since Thu May 10 12:04:36 2007
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
  See: http://sun.com/msg/SMF-8000-KS
  See: /etc/svc/volatile/system-device-local:default.log
Impact: 12 dependent services are not running:
   svc:/system/filesystem/minimal:default
   svc:/system/manifest-import:default
   svc:/milestone/single-user:default
   svc:/system/filesystem/local:default
   svc:/milestone/multi-user:default
   svc:/milestone/multi-user-server:default
   svc:/network/inetd-upgrade:default
   svc:/application/font/fc-cache:default
   svc:/system/console-login:default
   svc:/network/rpc/bind:default
   svc:/milestone/devices:default
   svc:/network/initial:default

# svcprop svc:/system/device/local:default |grep 'start\/exec'
start/exec astring /lib/svc/method/devices-local

Then I try to run the startup script with debug mode,the output as:
+ /sbin/zonename
+ [ global != global ]
+ . /lib/svc/share/smf_include.sh
SMF_EXIT_OK=0
SMF_EXIT_ERR_FATAL=95
SMF_EXIT_ERR_CONFIG=96
SMF_EXIT_MON_DEGRADE=97
SMF_EXIT_MON_OFFLINE=98
SMF_EXIT_ERR_NOSMF=99
SMF_EXIT_ERR_PERM=100
+ svcprop -q -p system/reconfigure system/svc/restarter:default
+ [ 1 -eq 0 ]
+ /usr/sbin/prtconf -F
fbdev=
+ [ 1 -eq 0 ]
+ [ -x /usr/sbin/zfs ]
+ /usr/sbin/zfs volinit
internal error: Bad file number
Abort - core dumped
+ exit 95

It's obvious that the script exit with non-zero is due to the code
'zfs volinit':

# zfs volinit
internal error: Bad file number
Abort - core dumped

# zpool status
internal error: Bad file number
Abort - core dumped
#
# zfs list
internal error: Bad file number
Abort - core dumped
#

seems that the  'zfs' not function well,How to fix it?

TIA.

Rgds,
Simon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Automatic rotating snapshots

2007-05-09 Thread Malachi de Ælfweald

I was thinking of setting up rotating snapshots... probably do
pool/[EMAIL PROTECTED]

Is Tim's method (
http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8 ) the current
preferred plan?


Thanks,
Malachi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Extremely long ZFS destroy operations

2007-05-09 Thread Anantha N. Srirama
I've since stopped making the second clone when I realized the 
.zfs/snapshot/ still exists after the clone operation is completed. 
So my need for the local clone is met by the direct access to the snapshot.

However, the poor performance of the destroy is still valid. It is quite 
possible that we might create another clone for reasons beyond my original 
reason.

Why is the destroy so slow with the second clone in play? Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Support for remote mirroring

2007-05-09 Thread Bev Crair

Folks,
We're following up with EMC on this.  We'll post something on the alias 
when we get it.


Please note that EMC would probably never say anything about 
OpenSolaris, but they'll talk about Solaris ZFS

Bev.

Torrey McMahon wrote:

Anantha N. Srirama wrote:
For whatever reason EMC notes (on PowerLink) suggest that ZFS is not 
supported on their arrays. If one is going to use a ZFS filesystem on 
top of a EMC array be warned about support issues.


They should have fixed that in their matrices. It should say something 
like, "EMC supports service LUNs to ZFS."

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Support for remote mirroring

2007-05-09 Thread Torrey McMahon

Anantha N. Srirama wrote:

For whatever reason EMC notes (on PowerLink) suggest that ZFS is not supported 
on their arrays. If one is going to use a ZFS filesystem on top of a EMC array 
be warned about support issues.


They should have fixed that in their matrices. It should say something 
like, "EMC supports service LUNs to ZFS."

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Need guidance on RAID 5, ZFS, and RAIDZ on home file server

2007-05-09 Thread Robert Milkowski
Hello Michael,

Tuesday, May 8, 2007, 9:20:56 PM, you wrote:

>> Probably RAID-Z as you don't have enough disks to be interesting for doing 
>> 1+0.
>> Paul

MC> How do you configure ZFS RAID 1+0 ?
MC> Will next lines do that right? :
MC> [b]zpool create -f zfs_raid1 mirror c0t1d0 c1t1d0
MC> zpool add zfs_raid1 mirror c2t1d0 c3t1d0
MC> zpool add zfs_raid1 mirror c4t1d0 c5t1d0[/b]
MC> Any help/info is very welcome!


Yep, above is correct.
However you can do it in one shot:

zpool create -f zfs_raid1 mirror c0t1d0 c1t1d0 mirror c2t1d0 c3t1d0 mirror 
c4t1d0 c5t1d0


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilvering speed?

2007-05-09 Thread Bakul Shah
> Robert Milkowski wrote:
> > Hello Mario,
> > 
> > Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
> > 
> > MG> I've read that it's supposed to go at full speed, i.e. as fast as
> > MG> possible. I'm doing a disk replace and what zpool reports kind of
> > MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
> > MG> throttled at some point between the builds, or is my ATA controller hav
> ing bigger issues?
> > 
> > Lot of small files perhaps? What kind of protection have you used?
> 
> Good question.  Remember that resilvering is done in time order and from
> the top-level metadata down, not by sequentially blasting bits.  Jeff
> Bonwick describes this as top-down resilvering.
>   http://blogs.sun.com/bonwick/entry/smokin_mirrors
> 
>  From a MTTR and performance perspective this means that ZFS recovery time
> is a function of the amount of space used, where it is located (!), and the
> validity of the surviving or regenerated data.  The big win is the amount of
> space used, as most file systems are not full.
>   -- richard

It seems to me that once you copy meta data, you can indeed
copy all live data sequentially.  Given that a vast majority
of disk blocks in use will typically contain data, this is a
winning strategy from a performance point of view and still
allows you to retrieve a fair bit of data in case of a second
disk failure (checksumming will catch a case where good
metadata points to as yet uncopied data block).  If amount of
live data is > 50% of disk space you may as well do a disk
copy, perhaps skipping over already copied meta data.

Not only that, you can even start using the disk being
resilvered right away for writes,  The new write will be
either to a) an already copied block or b) as yet uncopied
block.  In case a) there is nothing more to do.  In case b)
the copied-from-block will have the new data so in both cases
the right thing happens.  Any potential window between
reading a copied-from block and writing to copied-to block
can be closed with careful coding/locking.

If a second disk fails during copying, the current strategy
doesn't buy you much in most any case.  You really don't want
to go through a zillion files looking for survivors.  If you
have a backup, you will restore from that rather than look
through the debris.  Not to mention you have made the window
of a potentially catastrophic failure much larger if
resilvering is significantly slower.

Comments?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage Pools Recommendations for Productive Environments

2007-05-09 Thread Selim Daoud

which one is the most performant: copies=2 or zfs-mirror?

s.

On 5/9/07, Richard Elling <[EMAIL PROTECTED]> wrote:

comment below...

Toby Thain wrote:
>
> On 9-May-07, at 4:45 AM, Andreas Koppenhoefer wrote:
>
>> Hello,
>>
>> solaris Internals wiki contains many interesting things about zfs.
>> But i have no glue about the reasons for this entry:
>>
>> In Section "ZFS Storage Pools Recommendations - Storage Pools" you can
>> read:
>> [i]For all production environments, set up a redundant ZFS storage
>> pool, such as a raidz, raidz2, or a mirrored configuration, regardless
>> of the RAID level implemented on the underlying storage device.[/i]
>> (see
>> 
)
>>
>>
>> In our environment we use EMC based storage subsystems which are
>> protected by RAID1 (mirrored disks).
>> What's the reason for building an upper level zfs mirror on these
>> already mirrored disks?
>
> It's necessary if you wish ZFS to self-heal your data from errors
> detected in the underlying subsystems. Without redundancy at the pool
> level it can detect them (checksums) but not repair them.

Yes, however with copies, it is not necessarily required that we build
a redundant zpool.  I'll update the wiki.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Selim Daoud

go ahead with filebench and don't forget to set
set zfs:zfs_nocacheflush=1
in /etc/system (if using nevada)

s.

On 5/9/07, cesare VoltZ <[EMAIL PROTECTED]> wrote:

Hy,

I'm planning to test on pre-production data center a ZFS solution for
our application and I'm searching a good filesystem benchmark for see
which configuration is the best solution.

Server are Solaris 10 connected to a EMC Clariion CX3-20 with two FC
cable in a total high-availability (two HBA connected to different
switch, swtich are cross-connected to both storage processor). HBA
installed on the host and swicth port are 2Gbps, while CX3-20 is
equipped (disks and SP) for support 4Gbps.

LUN are configured as RAID5 accross 15 disks.

I used in the past iozone (http://www.iozone.org/) but I'm wondering
if there are other tools.

Thanks.

Cesare
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Resilvering speed?

2007-05-09 Thread Robert Milkowski
Hello Richard,

Wednesday, May 9, 2007, 9:10:22 PM, you wrote:

RE> Robert Milkowski wrote:
>> Hello Mario,
>> 
>> Wednesday, May 9, 2007, 5:56:18 PM, you wrote:
>> 
>> MG> I've read that it's supposed to go at full speed, i.e. as fast as
>> MG> possible. I'm doing a disk replace and what zpool reports kind of
>> MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
>> MG> throttled at some point between the builds, or is my ATA controller 
>> having bigger issues?
>> 
>> Lot of small files perhaps? What kind of protection have you used?

RE> Good question.  Remember that resilvering is done in time order and from
RE> the top-level metadata down, not by sequentially blasting bits.  Jeff
RE> Bonwick describes this as top-down resilvering.
RE> http://blogs.sun.com/bonwick/entry/smokin_mirrors

RE>  From a MTTR and performance perspective this means that ZFS recovery time
RE> is a function of the amount of space used, where it is located (!), and the
RE> validity of the surviving or regenerated data.  The big win is the amount of
RE> space used, as most file systems are not full.

Nevertheless with lot of small files written over many months (some
were removed) resilvering in raid-z2 is SLOOOW, even if there's no
other activity in a pool (7-10 days on x4500 with 11disk in raidz2
group). Either it's inherit in such environments or something else is wrong.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Support for remote mirroring

2007-05-09 Thread Robert Milkowski
Hello Anantha,

Wednesday, May 9, 2007, 4:45:10 PM, you wrote:

ANS> For whatever reason EMC notes (on PowerLink) suggest that ZFS is
ANS> not supported on their arrays. If one is going to use a ZFS
ANS> filesystem on top of a EMC array be warned about support issues.

Nope. For a couple of months they actually do support ZFS.

See http://milek.blogspot.com/2007/01/is-emc-affraid-of-zfs_06.html


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Ming Zhang
On Wed, 2007-05-09 at 21:09 +0200, Louwtjie Burger wrote:
> > > LUN are configured as RAID5 accross 15 disks.
> 
> Won't such a large amount of spindles have a negative impact on
> performance (in a single RAID-5 setup) ... single I/O from system
> generates lots of backend I/O's ?

yes, single io which hard to generate full stripe write on such large
number of disks and then eventually depends on NVRAM in your EMC box to
do the work.

this is why small number of disks in a raid5 is recommended by EMC also.

of course, again depends on your application workload pattern.

Ming

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilvering speed?

2007-05-09 Thread Richard Elling

Robert Milkowski wrote:

Hello Mario,

Wednesday, May 9, 2007, 5:56:18 PM, you wrote:

MG> I've read that it's supposed to go at full speed, i.e. as fast as
MG> possible. I'm doing a disk replace and what zpool reports kind of
MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
MG> throttled at some point between the builds, or is my ATA controller having 
bigger issues?

Lot of small files perhaps? What kind of protection have you used?


Good question.  Remember that resilvering is done in time order and from
the top-level metadata down, not by sequentially blasting bits.  Jeff
Bonwick describes this as top-down resilvering.
http://blogs.sun.com/bonwick/entry/smokin_mirrors

From a MTTR and performance perspective this means that ZFS recovery time
is a function of the amount of space used, where it is located (!), and the
validity of the surviving or regenerated data.  The big win is the amount of
space used, as most file systems are not full.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Louwtjie Burger

> LUN are configured as RAID5 accross 15 disks.


Won't such a large amount of spindles have a negative impact on
performance (in a single RAID-5 setup) ... single I/O from system
generates lots of backend I/O's ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Ming Zhang
On Wed, 2007-05-09 at 16:27 +0200, cesare VoltZ wrote:
> Hy,
> 
> I'm planning to test on pre-production data center a ZFS solution for
> our application and I'm searching a good filesystem benchmark for see
> which configuration is the best solution.
> 
> Server are Solaris 10 connected to a EMC Clariion CX3-20 with two FC
> cable in a total high-availability (two HBA connected to different
> switch, swtich are cross-connected to both storage processor). HBA
> installed on the host and swicth port are 2Gbps, while CX3-20 is
> equipped (disks and SP) for support 4Gbps.
> 
> LUN are configured as RAID5 accross 15 disks.

any point to use raid5 if you have raidz in zfs?


> 
> I used in the past iozone (http://www.iozone.org/) but I'm wondering
> if there are other tools.
> 
> Thanks.
> 
> Cesare
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilvering speed?

2007-05-09 Thread Robert Milkowski
Hello Mario,

Wednesday, May 9, 2007, 5:56:18 PM, you wrote:

MG> I've read that it's supposed to go at full speed, i.e. as fast as
MG> possible. I'm doing a disk replace and what zpool reports kind of
MG> surprises me. The resilver goes on at 1.6MB/s. Did resilvering get
MG> throttled at some point between the builds, or is my ATA controller having 
bigger issues?

Lot of small files perhaps? What kind of protection have you used?

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-09 Thread Bart Smaalders

Adam Leventhal wrote:

On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:

Can you give some more info on what these problems are.


I was thinking of this bug:

  6460622 zio_nowait() doesn't live up to its name

Which was surprised to find was fixed by Eric in build 59.

Adam



It was pointed out by Jürgen Keil that using ZFS compression
submits a lot of prio 60 tasks to the system task queues;
this would clobber interactive performance.

- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
--- Begin Message ---
> with recent bits ZFS compression is now handled concurrently with  
> many CPUs working on different records.
> So this load will burn more CPUs and acheive it's results  
> (compression) faster.
> 
> So the observed pauses should be consistent with that of a load  
> generating high system time.
> The assumption is that compression now goes faster than when is was  
> single threaded.
> 
> Is this undesirable ? We might seek a way to slow down compression in  
> order to limit the system load.

According to this dtrace script

#!/usr/sbin/dtrace -s

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
@where[stack()] = count();
}

tick-5s {
printa(@where);
trunc(@where);
}




... I see bursts of ~ 1000 zio_write_compress() [gzip] taskq calls
enqueued into the "spa_zio_issue" taskq by zfs`spa_sync() and
its children:

  0  76337 :tick-5s 
...
  zfs`zio_next_stage+0xa1
  zfs`zio_wait_for_children+0x5d
  zfs`zio_wait_children_ready+0x20
  zfs`zio_next_stage_async+0xbb
  zfs`zio_nowait+0x11
  zfs`dbuf_sync_leaf+0x1b3
  zfs`dbuf_sync_list+0x51
  zfs`dbuf_sync_indirect+0xcd
  zfs`dbuf_sync_list+0x5e
  zfs`dbuf_sync_indirect+0xcd
  zfs`dbuf_sync_list+0x5e
  zfs`dnode_sync+0x214
  zfs`dmu_objset_sync_dnodes+0x55
  zfs`dmu_objset_sync+0x13d
  zfs`dsl_dataset_sync+0x42
  zfs`dsl_pool_sync+0xb5
  zfs`spa_sync+0x1c5
  zfs`txg_sync_thread+0x19a
  unix`thread_start+0x8
 1092

  0  76337 :tick-5s 



It seems that after such a batch of compress requests is
submitted to the "spa_zio_issue" taskq, the kernel is busy
for several seconds working on these taskq entries.
It seems that this blocks all other "taskq" activity inside the
kernel...



This dtrace script counts the number of 
zio_write_compress() calls enqueued / execed 
by the kernel per second:

#!/usr/sbin/dtrace -qs

sdt:genunix::taskq-enqueue
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
this->tqe = (taskq_ent_t *)arg1;
@enq[this->tqe->tqent_func] = count();
}

sdt:genunix::taskq-exec-end
/((taskq_ent_t *)arg1)->tqent_func == (task_func_t *)&`zio_write_compress/
{
this->tqe = (taskq_ent_t *)arg1;
@exec[this->tqe->tqent_func] = count();
}

tick-1s {
/*
printf("%Y\n", walltimestamp);
*/
printf("TS(sec): %u\n", timestamp / 10);
printa("enqueue %a: [EMAIL PROTECTED]", @enq);
printa("exec%a: [EMAIL PROTECTED]", @exec);
trunc(@enq);
trunc(@exec);
}




I see bursts of zio_write_compress() calls enqueued / execed,
and periods of time where no zio_write_compress() taskq calls
are enqueued or execed.

10#  ~jk/src/dtrace/zpool_gzip7.d 
TS(sec): 7829
TS(sec): 7830
TS(sec): 7831
TS(sec): 7832
TS(sec): 7833
TS(sec): 7834
TS(sec): 7835
enqueue zfs`zio_write_compress: 1330
execzfs`zio_write_compress: 1330
TS(sec): 7836
TS(sec): 7837
TS(sec): 7838
TS(sec): 7839
TS(sec): 7840
TS(sec): 7841
TS(sec): 7842
TS(sec): 7843
TS(sec): 7844
enqueue zfs`zio_write_compress: 1116
execzfs`zio_write_compress: 1116
TS(sec): 7845
TS(sec): 7846
TS(sec): 7847
TS(sec): 7848
TS(sec): 7849
TS(sec): 7850
TS(sec): 7851
TS(sec): 7852
TS(sec): 7853
TS(sec): 7854
TS(sec): 7855
TS(sec): 7856
TS(sec): 7857
enqueue zfs`zio_write_compress: 932
execzfs`zio_write_compress: 932
TS(sec): 7858
TS(sec): 7859
TS(sec): 7860
TS(sec): 7861
TS(sec): 7862
TS(sec): 7863
TS(sec): 7864
TS(sec): 7865
TS(sec): 7866
TS(sec): 7867
enqueue zfs`zio_write_compress: 5
execzfs`zio_write_compress: 5
TS(sec): 7868
enqueue zfs`zio_write_compress: 774
execzfs`zio_write_compress: 774
TS(sec): 7869
TS(sec): 7870
TS(sec): 7871
TS(sec): 7872
TS(sec): 7873
TS(sec): 7874
TS(sec): 7875
TS(sec): 7876
enqueue zfs`zio_write_compress: 653
execzfs`zio_write_compress: 653
TS(sec): 7877
TS(sec): 7878
TS(sec): 7879
TS(sec): 7880
TS(sec): 7881


And a final dtrace script, which monitors scheduler activity while
filling a gzip compressed pool:

#!/usr/sbin/dtrace -qs

sched:::off-cpu,
sched:::on-cpu,
sched:::remain-cpu,
sched:

Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Richard Elling

cesare VoltZ wrote:

Hy,

I'm planning to test on pre-production data center a ZFS solution for
our application and I'm searching a good filesystem benchmark for see
which configuration is the best solution.


Pedantically, your application is always best.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-09 Thread Adam Leventhal
On Wed, May 09, 2007 at 11:52:06AM +0100, Darren J Moffat wrote:
> Can you give some more info on what these problems are.

I was thinking of this bug:

  6460622 zio_nowait() doesn't live up to its name

Which was surprised to find was fixed by Eric in build 59.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage Pools Recommendations for Productive Environments

2007-05-09 Thread Richard Elling

comment below...

Toby Thain wrote:


On 9-May-07, at 4:45 AM, Andreas Koppenhoefer wrote:


Hello,

solaris Internals wiki contains many interesting things about zfs.
But i have no glue about the reasons for this entry:

In Section "ZFS Storage Pools Recommendations - Storage Pools" you can 
read:
[i]For all production environments, set up a redundant ZFS storage 
pool, such as a raidz, raidz2, or a mirrored configuration, regardless 
of the RAID level implemented on the underlying storage device.[/i]
(see 
) 



In our environment we use EMC based storage subsystems which are 
protected by RAID1 (mirrored disks).
What's the reason for building an upper level zfs mirror on these 
already mirrored disks?


It's necessary if you wish ZFS to self-heal your data from errors 
detected in the underlying subsystems. Without redundancy at the pool 
level it can detect them (checksums) but not repair them.


Yes, however with copies, it is not necessarily required that we build
a redundant zpool.  I'll update the wiki.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Resilvering speed?

2007-05-09 Thread Mario Goebbels
I've read that it's supposed to go at full speed, i.e. as fast as possible. I'm 
doing a disk replace and what zpool reports kind of surprises me. The resilver 
goes on at 1.6MB/s. Did resilvering get throttled at some point between the 
builds, or is my ATA controller having bigger issues?

Thanks,
-mg
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Support for remote mirroring

2007-05-09 Thread Anantha N. Srirama
For whatever reason EMC notes (on PowerLink) suggest that ZFS is not supported 
on their arrays. If one is going to use a ZFS filesystem on top of a EMC array 
be warned about support issues.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem Benchmark

2007-05-09 Thread Rayson Ho

Tried filebench before??

http://www.solarisinternals.com/wiki/index.php/FileBench

Rayson


On 5/9/07, cesare VoltZ <[EMAIL PROTECTED]> wrote:

I used in the past iozone (http://www.iozone.org/) but I'm wondering
if there are other tools.

Thanks.

Cesare
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Extremely long ZFS destroy operations

2007-05-09 Thread Anantha N. Srirama
We've Solaris 10 Update 3 (aka 11/06) running on an E2900 (24 x 96). On this 
server we've been running a large SAS environment totalling well over 2TB. We 
also take daily snapshots of the filesystems and clone them for use by a local 
zone. This setup has been in use for well over 6 months.

Starting Monday I started making a second clone from the same snapshot to 
facilitate quick access to day old image of data in the global zone. I've 
started noticing that my ZFS destroy operations are inordinately long with the 
second clone in place (I'm using zfs destroy -Rf ). The degradation 
is close to an order of magnitude; my destroys now take 6-7 minutes while they 
took sub minute in the past.

Any thoughts? Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Filesystem Benchmark

2007-05-09 Thread cesare VoltZ

Hy,

I'm planning to test on pre-production data center a ZFS solution for
our application and I'm searching a good filesystem benchmark for see
which configuration is the best solution.

Server are Solaris 10 connected to a EMC Clariion CX3-20 with two FC
cable in a total high-availability (two HBA connected to different
switch, swtich are cross-connected to both storage processor). HBA
installed on the host and swicth port are 2Gbps, while CX3-20 is
equipped (disks and SP) for support 4Gbps.

LUN are configured as RAID5 accross 15 disks.

I used in the past iozone (http://www.iozone.org/) but I'm wondering
if there are other tools.

Thanks.

Cesare
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage Pools Recommendations for Productive Environments

2007-05-09 Thread Toby Thain


On 9-May-07, at 4:45 AM, Andreas Koppenhoefer wrote:


Hello,

solaris Internals wiki contains many interesting things about zfs.
But i have no glue about the reasons for this entry:

In Section "ZFS Storage Pools Recommendations - Storage Pools" you  
can read:
[i]For all production environments, set up a redundant ZFS storage  
pool, such as a raidz, raidz2, or a mirrored configuration,  
regardless of the RAID level implemented on the underlying storage  
device.[/i]
(see )


In our environment we use EMC based storage subsystems which are  
protected by RAID1 (mirrored disks).
What's the reason for building an upper level zfs mirror on these  
already mirrored disks?


It's necessary if you wish ZFS to self-heal your data from errors  
detected in the underlying subsystems. Without redundancy at the pool  
level it can detect them (checksums) but not repair them.


--Toby



Or do I misinterpret solaris internals wiki?


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status faulted, but raid1z status is online?

2007-05-09 Thread Alex
Drive in my solaris box that had the OS on it decided to kick the bucket this 
evening, a joyous occasion for all, but luckly all my data is stored on a zpool 
and the OS is nothing but a shell to serve it up on. One quick install later 
and im back trying to import my pool, and things are not going well. 

Once I have things where I want them, I issue an import
# zpool import
  pool: ftp
id: 1752478903061397634
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

ftp FAULTED   corrupted data
  raidz1DEGRADED
c1d0ONLINE
c1d1ONLINE
c4d0UNAVAIL   cannot open

Looks like c4d0 died as well, they were purchased at the same time but oh well. 
zfs should still be able to recover because i have 2 working drives, and the 
raidz1 says its degraded but not destroyed. But the pool itself reads as 
faulted?

I issue a import, with force thinking the system is just being silly.

# zpool import -f ftp
cannot import 'ftp': I/O error

Odd. After looking on the threads here I see that when importing a drive the 
label of a drive is rather important, so I go look at what zdb thinks the 
labels for my drives are

first, the pool itself
# zdb -l ftp

LABEL 0

failed to read label 0

LABEL 1

failed to read label 1

LABEL 2

failed to read label 2

LABEL 3

failed to read label 3

thats not good, how about the drives?

# zdb -l /dev/dsk/c1d0 

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

version=3
name='ftp'
state=2
txg=21807
pool_guid=7724307712458785867
top_guid=14476414087876222880
guid=3298133982375235519
vdev_tree
type='raidz'
id=0
guid=14476414087876222880
nparity=1
metaslab_array=13
metaslab_shift=32
ashift=9
asize=482945794048
children[0]
type='disk'
id=0
guid=4586792833877823382
path='/dev/dsk/c0d0s3'
devid='id1,[EMAIL PROTECTED]/d'
whole_disk=0
children[1]
type='disk'
id=1
guid=3298133982375235519
path='/dev/dsk/c4d0p0'
devid='id1,[EMAIL PROTECTED]/q'
whole_disk=0

LABEL 3

version=3
name='ftp'
state=2
txg=21807
pool_guid=7724307712458785867
top_guid=14476414087876222880
guid=3298133982375235519
vdev_tree
type='raidz'
id=0
guid=14476414087876222880
nparity=1
metaslab_array=13
metaslab_shift=32
ashift=9
asize=482945794048
children[0]
type='disk'
id=0
guid=4586792833877823382
path='/dev/dsk/c0d0s3'
devid='id1,[EMAIL PROTECTED]/d'
whole_disk=0
children[1]
type='disk'
id=1
guid=3298133982375235519
path='/dev/dsk/c4d0p0'
devid='id1,[EMAIL PROTECTED]/q'
whole_disk=0
#   
 

So the label on the disk itself is theremostly.
now for disk 2
# zdb -l /dev/dsk/c1d1  
 

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

version=3
name='ftp'
state=2
txg=21807
pool_guid=7724307712458785867
top_guid=11006938707951749786
guid=11006938707951749786
vdev_tree
type='disk'
id=1
guid=11006938707951749786
path='/dev/dsk/c1d0p0'
devid='id1,[EMAIL PROTECTED]/q'
whole_disk=0
metaslab_array=112
metaslab_shift=31
ashift=9
asize=250053918720
--

Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-09 Thread Darren J Moffat

Adam Leventhal wrote:


The in-kernel version is zlib is the latest version (1.2.3). It's not
surprising that we're spending all of our time in zlib if the machine is
being driving by I/O. There are outstanding problems with compression in
the ZIO pipeline that may contribute to the bursty behavior.


Can you give some more info on what these problems are.

I'm specifically interested in wither or not crypto will end up with 
similar issues.  Also because of the special nature of compression in 
the ZIO pipeline I had make a small modification[1] the compress part of 
the pipeline for crypto to run for the write stages.


[1] see: 
http://src.opensolaris.org/source/xref/zfs-crypto/zfs-crypto-gate/usr/src/uts/common/fs/zfs/zio.c#1053


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: gzip compression throttles system?

2007-05-09 Thread Adam Leventhal
On Thu, May 03, 2007 at 11:43:49AM -0500, [EMAIL PROTECTED] wrote:
> I think this may be a premature leap -- It is still undetermined if we are
> running up against a yet unknown bug in the kernel implementation of gzip
> used for this compression type. From my understanding the gzip code has
> been reused from an older kernel implementation,  it may be possible that
> this code has some issues with kernel stuttering when used for zfs
> compression that may have not been exposed with its original usage.  If it
> turns out that it is just a case of high cpu trade-off for buying faster
> compression times, then the talk of a tunable may make sense (if it is even
> possible given the constraints of the gzip code in kernelspace).

The in-kernel version is zlib is the latest version (1.2.3). It's not
surprising that we're spending all of our time in zlib if the machine is
being driving by I/O. There are outstanding problems with compression in
the ZIO pipeline that may contribute to the bursty behavior.

Adam

-- 
Adam Leventhal, Solaris Kernel Development   http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Storage Pools Recommendations for Productive Environments

2007-05-09 Thread Andreas Koppenhoefer
Hello,

solaris Internals wiki contains many interesting things about zfs.
But i have no glue about the reasons for this entry:

In Section "ZFS Storage Pools Recommendations - Storage Pools" you can read:
[i]For all production environments, set up a redundant ZFS storage pool, such 
as a raidz, raidz2, or a mirrored configuration, regardless of the RAID level 
implemented on the underlying storage device.[/i]
(see 
)

In our environment we use EMC based storage subsystems which are protected by 
RAID1 (mirrored disks).
What's the reason for building an upper level zfs mirror on these already 
mirrored disks?

Or do I misinterpret solaris internals wiki?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss