[zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?

2006-08-21 Thread Constantin Gonzalez
Hi,

my ZFS pool for my home server is a bit unusual:

pool: pelotillehue
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006
config:

NAMESTATE READ WRITE CKSUM
pelotillehue  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0d1s5  ONLINE   0 0 0
c1d0s5  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d0s3  ONLINE   0 0 0
c0d1s3  ONLINE   0 0 0
c1d0s3  ONLINE   0 0 0
c1d1s3  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d1s4  ONLINE   0 0 0
c1d0s4  ONLINE   0 0 0
c1d1s4  ONLINE   0 0 0

The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB.
It's a home server and so I crammed whatever I could find elswhere into that box
:) ) and my goal was to create the biggest pool possible but retaining some
level of redundancy.

The above config therefore groups the biggest slices that can be created on all
four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be
created on 3 disks into the 3-disk RAID-Z, then two large slices remain which
are mirrored. It's like playing Tetris with disk slices... But the pool can
tolerate 1 broken disk and it gave me maximum storage capacity, so be it.

This means that we have one pool with 3 vdevs that access up to 3 different
sliced on the same physical disk.

Question: Does ZFS consider the underlying physical disks when load-balancing
or does it only load-balance across vdevs thereby potentially overloading
physical disks with up to 3 parallel requests per physical disk at once?

I'm pretty sure ZFS is very intelligent and will do the right thing, but a
confirmation would be nice here.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] destroyed pools signatures

2006-08-21 Thread Robert Milkowski
Hello zfs-discuss,

  I've got many ydisks in a JBOD (100) and while doing tests there
  are lot of destroyed pools. Then some disks are re-used to be part
  of new pools. Now if I do zpool import -D I can see lot of destroyed
  pool in a state that I can't import them anyway (like only two disks
  left from a previously much larger raid-z group, etc.). It's getting
  messy.

  It would be nice to have an command to 'clear' such disks - remove
  ZFS signatures so nothing will show up for those disks.

  What do you think?

-- 
Best regards,
 Robert  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] pool ID

2006-08-21 Thread Robert Milkowski
Hello zfs-discuss,

  Looks like I can't get pool ID once pool is imported.
  IMHO zpool show should display it also.

-- 
Best regards,
 Robert  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] destroyed pools signatures

2006-08-21 Thread Roch

Hi Robert, Maybe this RFE would contribute to alleviate your 
problem:
6417135 need generic way to dissociate disk or slice from it's 
filesystem

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6417135

-r

Robert Milkowski writes:
  Hello zfs-discuss,
  
I've got many ydisks in a JBOD (100) and while doing tests there
are lot of destroyed pools. Then some disks are re-used to be part
of new pools. Now if I do zpool import -D I can see lot of destroyed
pool in a state that I can't import them anyway (like only two disks
left from a previously much larger raid-z group, etc.). It's getting
messy.
  
It would be nice to have an command to 'clear' such disks - remove
ZFS signatures so nothing will show up for those disks.
  
What do you think?
  
  -- 
  Best regards,
   Robert  mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposal: user-defined properties

2006-08-21 Thread Roch


Eric Schrock writes:
  Following up on a string of related proposals, here is another draft
  proposal for user-defined properties.  As usual, all feedback and
  comments are welcome.
  
  The prototype is finished, and I would expect the code to be integrated
  sometime within the next month.
  
  - Eric
  
  INTRODUCTION
  
  ZFS currently supports a well-defined set of properties for managing ZFS
  datasets.  These properties represent either read-only statistics
  exported by the ZFS framework ('available', 'compressratio', etc), or
  editable properties which affect the behavior of ZFS ('compression',
  'readonly', etc).
  
  While these properties provide a structured way to interact with ZFS, a
  common request is to allow unstructured properties to be attached to ZFS
  datasets.   This is covered by the following RFE:
  
  6281585 user defined properties
  
  This would allow administrators to add annotations to datasets, as well
  as allowing ISVs to store application-specific settings that interact
  with individual datasets.
  
  DETAILS
  
  This proposal adds a new classification of ZFS properties known as 'user
  properties'.  The existing native properties will remain, as they
  provide additional semantics (mainly validation) which are closely tied
  to the underlying implementation.
  
  Any property which contains a colon (':') is defined as a 'user
  property'.  The name can contain alphanumeric characters, plus the
  following special characters: ':', '-', '.', '_'.  User properties are
  always strings, and are always inherited.  No additional validation is
  done on the contents.  Properties are set and retrieved through the
  standard mechanisms: 'zfs set', 'zfs get', and 'zfs inherit'.
  Inheriting a property which is not set in any parent is equivalent to
  clearing the property, as there is no default value for user-defined
  properties.
  
  It is expected that the colon will serve two purposes: to distinguish
  between native properties and provide an (unenforced) namespace for user
  properties.  For example, it is hoped that properties are defined as
  'module:property', to group properties together and to provide a
  larger namespace for logical separation of properties.  No enforcement
  of this namespace is done by ZFS, however, and the empty string is valid
  on both sides of the colon.
  
  EXAMPLES
  
   # zfs set local:department=12345 test
   # zfs get -r local:department test 
   NAME  PROPERTY  VALUE  SOURCE
   test  local:department  12345  local
   test/foo  local:department  12345  inherited from test
   # zfs list -o name,local:department
   NAME  LOCAL:DEPARTMENT
   test  12345
   test/foo  12345
   # zfs set local:department=67890 test/foo
   # zfs inherit local:department test
   # zfs get -s local -r all test 
   NAME  PROPERTY  VALUE  SOURCE
   test/foo  local:department  12345  local
   # zfs list -o name,local:department
   NAME  LOCAL:DEPARTMENT
   test  -
   test/foo  12345
   
  MANPAGE CHANGES
  
  TBD
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Great.

We might need something to 'destroy' those properties, locally and recursively ?
Is empty string a valid VALUE, does this need to be spelled out ?

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS write performance problem with compression set to ON

2006-08-21 Thread Anantha N. Srirama
I've a few questions:

 - Does 'zpool iostat' report numbers from the top of the ZFS stack or at the 
bottom? I've corelated the zpool iostat numbers with the system iostat numbers 
and they matchup. This tells me the numbers are from the 'bottom' of the ZFS 
stack, right? Having said that it'd be nice to have zpool iostat return numbers 
at the top of the stack. This becomes relevant when we've compression =ON.

  - Secondly, I did some more tests and I find the same read waves and the 
consistent write throughput. I've been reading another thread on this forum 
about Niagara and the compression where Matt Ahrens noted that the compression 
at this time is single-threaded. Further, he stated that there maybe a bugfix 
released to use multiple threads. I eagerly await the fix.

Thanks again for a great feature. Looking forward to more fun stuff out of Sun 
and you Mr. Bonwick.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS questions with mirrors

2006-08-21 Thread Peter Wilk
IHAC that is asking the following. any thoughts would be appreciated

Take two drives, zpool to make a mirror.
Remove a drive - and the server HANGS. Power off and reboot the server,
and everything comes up cleanly.

Take the same two drives (still Solaris 10). Install Veritas Volume
Manager (4.1). Mirror the two drives. Remove a drive - everything is
still running. Replace the drive, everything still working. No outage.

So the big questions to Tech support:
1. Is this a known property of ZFS ? That when a drive from a hot swap
system is removed the server hangs ? (We were attempting to simulate a
drive failure)
2. Or is this just because it was an E450 ? Ie, would removing a zfs
mirror disk (unexpected hardware removal as opposed to using zfs to
remove the disk) on a V240 or V480 cause the same problem ?
3. What could we expect if a drive mysteriously failed during
operation of a server with a zfs mirror ? Would the server hang like it
did during testing ? How can we test this ?
4. If it is a known property of zfs, is there a date when it is
expected to be fixed (if ever) ?



Peter

PS: I may not be on this alias so please respond to me directly
-- 
=
 __
/_/\
   /_\\ \Peter Wilk -  OS/Security Support
  /_\ \\ /   Sun Microsystems
 /_/ \/ / /  1 Network Drive,  P.O Box 4004
/_/ /   \//\ Burlington, Massachusetts 01803-0904
\_\//\   / / 1-800-USA-4SUN, opt 1, opt 1,case number#
 \_/ / /\ /  Email: [EMAIL PROTECTED]
  \_/ \\ \   =
   \_\ \\
\_\/

=



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] ZFS Filesytem Corrpution

2006-08-21 Thread Srivastava, Sanjaya

 I agree with you, but only 50%. Mirroring will only mask the problem
and will delay the fs corruption
 (Depending on who zfs responds to data corruption. Does it go back and
recheck the blocks later or just marks them bad?)

 The problem lies in somewhere in hardware, but certainly not in disks.

 I have over 20 machines exhibiting the same behavior.  If I put a raid
card in between the problem disappears altogether.





...Sanjaya



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 18, 2006 11:59 AM
To: Srivastava, Sanjaya
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS Filesytem Corrpution

Srivastava, Sanjaya wrote:
I have been seeing data corruption on the ZFS filesystem. Here are 
 some details. The machine is running s10 on X86 platform with a single

 160Gb SATA disk.  (root on s0  and zfs on s7)

I'd wager that it is a hardware problem.  Personally, I've had less than
satisfactory reliability experiences with 160 GByte disks from a variety
of vendors.  Try mirroring.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS questions with mirrors

2006-08-21 Thread Eric Schrock
The current behavior depends on the implementation of the driver and
support for hotplug events.  When a drive is yanked, one of two things
can happen:

- I/Os will fail, and any attempt to re-open the device will result in
  failure.

- I/Os will fail, but the device can continued to be opened by its
  existing path.

ZFS currently handles case #1 and will mark the device faulted,
generating an FMA fault in the process.  Future ZFS/FMA integration will
address problem #2, and is on the short list of features to address.  In
the meantime, you can 'zpool offline' the bad device to prevent ZFS from
trying to access it.

That being said, the server should never hang - only proceed arbitrarily
slowly.  When you say 'hang', what does that mean?

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?

2006-08-21 Thread eric kustarz

Constantin Gonzalez wrote:


Hi,

my ZFS pool for my home server is a bit unusual:

   pool: pelotillehue
state: ONLINE
scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006
config:

   NAMESTATE READ WRITE CKSUM
   pelotillehue  ONLINE   0 0 0
 mirrorONLINE   0 0 0
   c0d1s5  ONLINE   0 0 0
   c1d0s5  ONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c0d0s3  ONLINE   0 0 0
   c0d1s3  ONLINE   0 0 0
   c1d0s3  ONLINE   0 0 0
   c1d1s3  ONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c0d1s4  ONLINE   0 0 0
   c1d0s4  ONLINE   0 0 0
   c1d1s4  ONLINE   0 0 0

The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB.
It's a home server and so I crammed whatever I could find elswhere into that box
:) ) and my goal was to create the biggest pool possible but retaining some
level of redundancy.

The above config therefore groups the biggest slices that can be created on all
four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be
created on 3 disks into the 3-disk RAID-Z, then two large slices remain which
are mirrored. It's like playing Tetris with disk slices... But the pool can
tolerate 1 broken disk and it gave me maximum storage capacity, so be it.

This means that we have one pool with 3 vdevs that access up to 3 different
sliced on the same physical disk.

Question: Does ZFS consider the underlying physical disks when load-balancing
or does it only load-balance across vdevs thereby potentially overloading
physical disks with up to 3 parallel requests per physical disk at once?
 



ZFS only does dynamic striping across the (top-level) vdevs.

I understand why you setup your pool that way, but ZFS really likes 
whole disks instead of slices.


Trying to interpret that the devices are really slices and part of other 
vdevds seems overly complicated for the gain achieved.


eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SCSI synchronize cache cmd

2006-08-21 Thread Steve Powers

Hi,
I work on a support team for the Sun StorEdge 6920 and have a
question about the use of the SCSI sync cache command in Solaris
and ZFS. We have a bug in our 6920 software that exposes us to a
memory leak  when we receive the SCSI sync cache command:

6456312 - SCSI Synchronize Cache Command is flawed

It will take some time for this bug fix to role out to the field so  we
need to understand our exposure here.   I have been informed that
ZFS may use this in S10 thru a new sd/ssd ioctl.   Can anyone confirm
that as well as whether there is a config option to disable this command?

Thanks,
Steve
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] destroyed pools signatures

2006-08-21 Thread eric kustarz

Robert Milkowski wrote:


Hello zfs-discuss,

 I've got many ydisks in a JBOD (100) and while doing tests there
 are lot of destroyed pools. Then some disks are re-used to be part
 of new pools. Now if I do zpool import -D I can see lot of destroyed
 pool in a state that I can't import them anyway (like only two disks
 left from a previously much larger raid-z group, etc.). It's getting
 messy.

 It would be nice to have an command to 'clear' such disks - remove
 ZFS signatures so nothing will show up for those disks.

 What do you think?

 

That could be nice, though a way of doing that now is overwriting the 
labels by using dd (assuming you can overwrite all the devices from the 
now-defunct destroyed pool).


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: SCSI synchronize cache cmd

2006-08-21 Thread Anton B. Rang
Yes, ZFS uses this command very frequently. However, it only does this if the 
whole disk is under the control of ZFS, I believe; so a workaround could be to 
use slices rather than whole disks when creating a ZFS pool on a buggy device.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Niagara and ZFS compression?

2006-08-21 Thread Mike Gerdts
On 8/21/06, Richard Elling - PAE [EMAIL PROTECTED] wrote:
I haven't done measurements of this in years, but...I'll wager that compression is memory bound, not CPU bound, for today'sservers.A system with low latency and high bandwidth memory will performwell (UltraSPARC-T1).Threading may not help much on systems with a single
memory interface, but should help some on systems with multiple memoryinterfaces (UltraSPARC-*, Opteron, Athlon FX, etc.)-- richardA rather simple test using CSQamp.pkg from the cooltools download site. There's nothing magical about this file - it just happens to be a largish file that I had on hand.
$ time gzip -c  CSQamp.pkg  /dev/nullV40z:real 0m15.339suser 0m14.534ssys 0m0.485sV240:real 0m35.825suser 0m35.335ssys 0m0.284sT2000:
time gzip -c  CSQamp.pkg  /dev/nullreal 1m33.669suser 1m32.768ssys 0m0.881sIf I do 8 gzips in parallel:V40z:time ~/scripts/pgzip real 0m32.632s
user 1m53.382ssys 0m1.653sV240:time ~/scripts/pgzip real 2m24.704suser 4m42.430ssys 0m2.305sT2000:time ~/scripts/pgzip real 1m40.165s
user 13m10.475ssys 0m6.578sIn each of the tests, the file was in /tmp. As expected, the V40z running 8 gzip processes (using 4 cores) took twice as long as it did running 1 (using 1 core). The V240 took 4 times as long (8 processes, 2 threads) as the original, and the T2000 ran 8 (8 processes, 8 cores) in just about the same amount of time as it ran 1.
For giggles, I ran 32 processes on the T2000 and came up with 5m4.585s (real) 158m33.380s (user) and 42.484s (sys). In other words, the T2000 running 32 gzip processes had an elapsed time of 3 times greater than 8 processes. Even though the elapsed jumped by 3x, the %sys jumped by nearly 7x.
Here's a summary:Server gzips Seconds MB/secV40z 8 32.632 49,445 T2000 32 304.585 21,189 T2000 8 100.165 16,108 V40z 1 15.339 13,149 V240 8 
144.704 11,150 V240 1 35.825 5,630 T2000 1 99.669 2,024 Clearly more threads doing compression with gzip give better performance than a single thread. How that translates into memory vs. CPU speed, I am not sure. However, I can't help but think that if my file server is compressing every data block that it writes that it would be able to write more data if it used a thread (or more) per core I would come out ahead.
I am a firm believer that the next generation of compression commands and libraries need to use parallel algorithms. The simplest way to do this would be to divide the data into chunks and farm out each chunk to various worker threads. This will likely come at the cost of efficiency of the compression, but in intial tests I have done this amounts to a very small difference in size relative to the speedup achieved. Initial tests were with a chunk of C code and zlib.
Mike-- Mike Gerdtshttp://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: SCSI synchronize cache cmd

2006-08-21 Thread Bill Moore
On Mon, Aug 21, 2006 at 02:40:40PM -0700, Anton B. Rang wrote:
 Yes, ZFS uses this command very frequently. However, it only does this
 if the whole disk is under the control of ZFS, I believe; so a
 workaround could be to use slices rather than whole disks when
 creating a ZFS pool on a buggy device.

Actually, we issue the command no matter if we are using a whole disk or
just a slice.  Short of an mdb script, there is not a way to disable it.
We are trying to figure out ways to allow users to specify workarounds
for broken hardware without getting the ZFS code all messy as a result.


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss