Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Cyril Payet
Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
# vxdisk -o alldgs list does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert
Envoyé : samedi 3 janvier 2009 09:10
À : zfs-discuss@opensolaris.org
Objet : [zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Rodney Lindner - Services Chief Technologist




Yep..
Just run zpool import without a poolname and it will list any pools
that are available for import.

eg:
sb2000::#zpool import
 pool: mp
 id: 17232673347678393572
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

 mp ONLINE
 raidz2 ONLINE
 c1t2d0 ONLINE
 c1t3d0 ONLINE
 c1t4d0 ONLINE
 c1t5d0 ONLINE
 c1t8d0 ONLINE
 c1t9d0 ONLINE
 c1t10d0 ONLINE
 c1t11d0 ONLINE
 c1t12d0 ONLINE
 c1t13d0 ONLINE
 c1t14d0 ONLINE
 spares
 c1t15d0
sb2000::#zpool import mp

Regards
Rodney

Cyril Payet wrote:

  Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
"# vxdisk -o alldgs list "does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert
Envoy : samedi 3 janvier 2009 09:10
 : zfs-discuss@opensolaris.org
Objet : [zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


-- 


 Rodney Lindner
/ Services Chief Technologist
Sun Microsystems, Inc.
33 Berry St
Nth Sydney, NSW, AU, 2060
Phone x59674/+61294669674
Email rodney.lind...@sun.com
 
Cut Utility Costs in Half - Learn More



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Metaslab alignment on RAID-Z

2009-01-06 Thread Robert Milkowski
Is there any update on this? You suggested that Jeff had some kind of solution 
for this - has it been integrated or is someone working on it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread Anton B. Rang
For SCSI disks (including FC), you would use the FUA bit on the read command.

For SATA disks ... does anyone care?  ;-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs list improvements?

2009-01-06 Thread Chris Gerhard
To improve the performance of scripts that manipulate zfs snapshots and the zfs 
snapshot service in perticular there needs to be a way to list all the 
snapshots for a given object and only the snapshots for that object.

There are two RFEs filed that cover this:

http://bugs.opensolaris.org/view_bug.do?bug_id=6352014 :

'zfs list' should have an option to only present direct descendents

http://bugs.opensolaris.org/view_bug.do?bug_id=6762432 zfs 

list --depth

The first is asking for a way to list only the direct descendents of a data 
set, ie it's children. The second asks to be able to list all the data sets 
down to a depth of N. 

So zfs list --depth 0 would be almost the same as zfs list -c except that it 
would also list the parent data set.

While zfs list -c is more user friendly zfs list -depth is more powerful. I'm 
wondering if both should be fixed or just one and if just one which?

Comments?

--chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Cyril Payet
OK, got it : just use zpool import.
Sorry for the inconvenience ;-)
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de Cyril Payet
Envoyé : mardi 6 janvier 2009 09:16
À : D. Eckert; zfs-discuss@opensolaris.org
Objet : Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
# vxdisk -o alldgs list does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert Envoyé : 
samedi 3 janvier 2009 09:10 À : zfs-discuss@opensolaris.org Objet : 
[zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find out the zpool of an uberblock printed with the fbt:zfs:uberblock_update: probes?

2009-01-06 Thread Marcelo Leal
 Hi,

 Hello Bernd,

 
 After I published a blog entry about installing
 OpenSolaris 2008.11 on a 
 USB stick, I read a comment about a possible issue
 with wearing out 
 blocks on the USB stick after some time because ZFS
 overwrites its 
 uberblocks in place.
 I did not understand well what you are trying to say with wearing out 
blocks, but in fact the uberblocks are not overwriten in place. The pattern 
you did notice with the dtrace script, is the update of the uberblock that is 
maintained in an array of 128 elements (1K each, just one active at time). Each 
physical vdev has four labes (256K structures) L0, L1, L2, and L3. Two in the 
begining and two at the end.
 Because the labels are in fixed location on disk, is the only update that zfs 
does not uses cow, but a two staged update. IIRC, the update is L0 and L2,and 
after that L1 and L3.
 Take a look:

 
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_label.c

 So:
 - The label is overwritten (in a two staged update);
 - The uberblock is not overwritten, but do write to a new element on the 
array. So, the transition from one uberblock(txg and timestamp) to another is 
atomic.

 I'm deploying a USB solution too, so if you can clarify the problem, i would 
appreciate it. 

ps.: I did look your blog, but did not see any comments around that, and the 
comments section is closed. ;-)

 Leal
[http://www.eall.com.br/blog]

 
 I tried to get more information about how updating
 uberblocks works with 
 the following dtrace script:
 
 /* io:genunix::start */
 io:genunix:default_physio:start,
 io:genunix:bdev_strategy:start,
 io:genunix:biodone:done
 {
 printf (%d %s %d %d, timestamp, execname,
  args[0]-b_blkno, 
 rgs[0]-b_bcount);
 }
 
 fbt:zfs:uberblock_update:entry
 {
 printf (%d (%d) %d, %d, %d, %d, %d, %d, %d, %d,
  timestamp,
  args[0]-ub_timestamp,
  args[0]-ub_rootbp.blk_prop, args[0]-ub_guid_sum,
 args[0]-ub_rootbp.blk_birth,
  args[0]-ub_rootbp.blk_fill,
 args[1]-vdev_id, args[1]-vdev_asize,
  args[1]-vdev_psize,
  args[2]);
 e output shows the following pattern after most of
 the 
 uberblock_update events:
 
 0  34404 uberblock_update:entry 244484736418912
  (1231084189) 
 226475971064889345, 4541013553469450828, 26747, 159,
 0, 0, 0, 26747
 0   6668bdev_strategy:start 244485190035647
  sched 502 1024
 0   6668bdev_strategy:start 244485190094304
  sched 1014 1024
 0   6668bdev_strategy:start 244485190129133
  sched 39005174 1024
 0   6668bdev_strategy:start 244485190163273
  sched 39005686 1024
 0   6656  biodone:done 244485190745068
  sched 502 1024
 0   6656  biodone:done 244485191239190
  sched 1014 1024
 0   6656  biodone:done 244485191737766
  sched 39005174 1024
 0   6656  biodone:done 244485192236988
  sched 39005686 1024
 ...
 0  34404   uberblock_update:entry
  244514710086249 
 1231084219) 9226475971064889345, 4541013553469450828,
 26747, 159, 0, 0, 
 0, 26748
 0  34404   uberblock_update:entry
  244544710086804 
 1231084249) 9226475971064889345, 4541013553469450828,
 26747, 159, 0, 0, 
 0, 26749
 ...
 0  34404   uberblock_update:entry
  244574740885524 
 1231084279) 9226475971064889345, 4541013553469450828,
 26750, 159, 0, 0, 
 0, 26750
 0   6668 bdev_strategy:start 244575189866189
  sched 508 1024
 0   6668 bdev_strategy:start 244575189926518
  sched 1020 1024
 0   6668 bdev_strategy:start 244575189961783
  sched 39005180 1024
 0   6668 bdev_strategy:start 244575189995547
  sched 39005692 1024
 0   6656   biodone:done 244575190584497
  sched 508 1024
 0   6656   biodone:done 244575191077651
  sched 1020 1024
 0   6656   biodone:done 244575191576723
  sched 39005180 1024
 0   6656   biodone:done 244575192077070
  sched 39005692 1024
 I am not a dtrace or zfs expert, but to me it looks
 like in many cases, 
 an uberblock update is followed by a write of 1024
 bytes to four 
 different disk blocks. I also found that the four
 block numbers are 
 incremented with always even numbers (256, 258, 260,
 ,..) 127 times and 
 then the first block is written again. Which would
 mean that for a txg 
 of 5, the four uberblock copies have been written
 5/127=393 
 times (Correct?).
 
 What I would like to find out is how to access fields
 from arg1 (this is 
 the data of type vdev in:
 
 int uberblock_update(uberblock_t *ub, vdev_t *rvd,
 uint64_t txg)
 
 ). When using the fbt:zfs:uberblock_update:entry
 probe, its elements are 
 always 0, as you can see in the above output. When
 using the 
 fbt:zfs:uberblock_update:return probe, I am getting
 an error message 
 like the following:
 
 dtrace: failed to compile script
 zfs-uberblock-report-04.d: line 14: 
 operator - must be applied to a pointer
 
 Any idea how to access the fields of vdev, or how to
 print out the pool 
 name associated to an uberblock_update event?
 
 Regards,
 
 Bernd
 

[zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
My OpenSolaris 2008/11 PC seems to attain better throughput with one big 
sixteen-device RAIDZ2 than with four stripes of 4-device RAIDZ.  I know it's by 
no means an exhaustive test, but catting /dev/zero to a file in the pool now 
frequently exceeds 600 Megabytes per second, whereas before with the striped 
RAIDZ I was only occasionally peaking around 400MB/s.  The kit is SuperMicro 
Intel 64 bit, 2-socket by 4 thread, 3 GHz with two AOC MV8 boards and 800 MHz 
(iirc) fsb connecting 16 GB RAM that runs at equal speed to fsb.  Cheap 7200 
RPM Seagate SATA half-TB disks with 32MB cache.

Is this increase explicable / expected?  The throughput calculator sheet output 
I saw seemed to forecast better iops with the striped raidz vdevs and I'd read 
that, generally, throughput is augmented by keeping the number of vdevs in the 
single digits.  Is my superlative result perhaps related to the large cpu and 
memory bandwidth?

Just throwing this out for sake of discussion/sanity check..

thx
jake
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Keith Bierman

On Jan 6, 2009, at 9:44 AM   1/6/, Jacob Ritorto wrote:

  but catting /dev/zero to a file in the pool now f

Do you get the same sort of results from /dev/random?

I wouldn't be surprised if /dev/zero turns out to be a special case.

Indeed, using any of the special files is probably not ideal.


-- 
Keith H. Bierman   khb...@gmail.com  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

 My OpenSolaris 2008/11 PC seems to attain better throughput with one 
 big sixteen-device RAIDZ2 than with four stripes of 4-device RAIDZ. 
 I know it's by no means an exhaustive test, but catting /dev/zero to 
 a file in the pool now frequently exceeds 600 Megabytes per second, 
 whereas before with the striped RAIDZ I was only occasionally 
 peaking around 400MB/s.  The kit is SuperMicro Intel 64 bit,

 Is this increase explicable / expected?  The throughput calculator

This is not surprising.  However, your test is only testing the write 
performance using a single process.  With multiple writers and 
readers, the throughput will be better when using the configuration 
with more vdevs.

It is not recommended to use such a large RAIDZ2 due to the multi-user 
performance concern, and because a single slow/failing disk drive can 
destroy the performance until it is identified and fixed.  Maybe a 
balky (but still functioning) drive won't be replaced under warranty 
and so you have to pay for a replacement out of your own pocket.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Keith Bierman wrote:

 Do you get the same sort of results from /dev/random?

/dev/random is very slow and should not be used for benchmarking.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
Is urandom nonblocking?



On Tue, Jan 6, 2009 at 1:12 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Tue, 6 Jan 2009, Keith Bierman wrote:

 Do you get the same sort of results from /dev/random?

 /dev/random is very slow and should not be used for benchmarking.

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Keith Bierman

On Jan 6, 2009, at 11:12 AM   1/6/, Bob Friesenhahn wrote:

 On Tue, 6 Jan 2009, Keith Bierman wrote:

 Do you get the same sort of results from /dev/random?

 /dev/random is very slow and should not be used for benchmarking.

Not directly, no. But copying from /dev/random to a real file and  
using that should provide better insight than all zeros or all ones  
(I have seen clever devices optimize things away).

Tests like bonnie are probably a better bet than rolling one's own;  
although the latter is good for building intuition ;

-- 
Keith H. Bierman   khb...@gmail.com  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

 Is urandom nonblocking?

The OS provided random devices need to be secure and so they depend on 
collecting entropy from the system so the random values are truely 
random.  They also execute complex code to produce the random numbers. 
As a result, both of the random device interfaces are much slower than 
a disk drive.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
OK, so use a real io test program or at least pre-generate files large
enough to exceed RAM caching?



On Tue, Jan 6, 2009 at 1:19 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Tue, 6 Jan 2009, Jacob Ritorto wrote:

 Is urandom nonblocking?

 The OS provided random devices need to be secure and so they depend on
 collecting entropy from the system so the random values are truely random.
  They also execute complex code to produce the random numbers. As a result,
 both of the random device interfaces are much slower than a disk drive.

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500, snv_101a, hd and zfs

2009-01-06 Thread Elaine Ashton
Ok, it gets a bit more specific

hdadm and write_cache run 'format -e -d $disk' 

On this system, format will produce the list of devices in short order - format 
-e, however, takes much, much longer and would explain why it takes hours to 
iterate over 48 drives.

It's very curious and I'm not sure at this point if it's related to ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Hi all,

I did an install of OpenSolaris in which I specified that the whole disk should 
be used for the installation. Here is what format verify produces for that 
disk:

Part  TagFlag Cylinders SizeBlocks
  0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
  1 unassignedwm   00 (0/0/0) 0
  2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
  3 unassignedwm   00 (0/0/0) 0
  4 unassignedwm   00 (0/0/0) 0
  5 unassignedwm   00 (0/0/0) 0
  6 unassignedwm   00 (0/0/0) 0
  7 unassignedwm   00 (0/0/0) 0
  8   bootwu   0 - 07.84MB(1/0/0) 16065
  9 unassignedwm   00 (0/0/0) 0

I have several questions. First, what is the purpose of partitions 2 and 8 
here? Why not simply have partition 0, the root partition, be the only 
partition, and start at cylinder 0 as opposed to 1?

My second question concerns the disk I have used to mirror the first root zpool 
disk. After I set up the second disk to mirror the first one with zpool attach 
-f rpool c3t0d0s0 c4t0d0s0, I got the response

Please be sure to invoke installgrub(1M) to make 'c4t0d0s0' bootable.

Is that correct? Or do I want to make c4t0d0s8 bootable, given that the label 
of that partition is boot? I cannot help finding this a little confusing. As 
far as i can tell, c4t0d0s8 (as well as c3t0d0s8 from the original disk which I 
mirrored), cylinder 0, is not used for anything.

Finally, is the correct command to make the disk I have added to mirror the 
first disk bootable

installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t0d0s0 ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 08:44:01AM -0800, Jacob Ritorto wrote:

 Is this increase explicable / expected?  The throughput calculator
 sheet output I saw seemed to forecast better iops with the striped
 raidz vdevs and I'd read that, generally, throughput is augmented by
 keeping the number of vdevs in the single digits.  Is my superlative
 result perhaps related to the large cpu and memory bandwidth?

I'd think that for pure sequential loads, larger column setups wouldn't
have too many performance issues.

But as soon as you try to do random reads on the large setup you're
going to be much more limited.  Do you have tools to do random I/O
exercises?

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance issue with zfs send of a zvol

2009-01-06 Thread Brian H. Nelson
I noticed this issue yesterday when I first started playing around with 
zfs send/recv. This is on Solaris 10U6.

It seems that a zfs send of a zvol issues 'volblocksize' reads to the 
physical devices. This doesn't make any sense to me, as zfs generally 
consolidates read/write requests to improve performance. Even the dd 
case with the same snapshot does not exhibit this behavior. It seems to 
be specific to zfs send.

I checked with 8k, 64k, and 128k volblocksize, and the reads generated 
by zfs send always seem to follow that size, while the reads with dd do not.

The small reads seems to hurt performance of zfs send. I tested with a 
mirror, but on another machine with a 7 disk raidz, the performance is 
MUCH worse because the 8k reads get broken up into even smaller reads 
and spread across the raidz.

Is this a bug, or can someone explain why this is happening?

Thanks
-Brian

Using 8k volblocksize:

-bash-3.00# zfs send pool1/vo...@now  /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G  1.88K  0  15.0M  0
  mirror 4.01G   274G  1.88K  0  15.0M  0
c0t9d0   -  -961  0  7.46M  0
c0t11d0  -  -968  0  7.53M  0
---  -  -  -  -  -  -
== ~8k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vo...@now of=/dev/null bs=8k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G  2.25K  0  17.9M  0
  mirror 4.01G   274G  2.25K  0  17.9M  0
c0t9d0   -  -108  0  9.00M  0
c0t11d0  -  -109  0  8.92M  0
---  -  -  -  -  -  -
== ~8k reads to pool, ~85k reads to drives


Using volblocksize of 64k:

-bash-3.00# zfs send pool1/vol...@now  /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool16.01G   272G378  0  23.5M  0
  mirror 6.01G   272G378  0  23.5M  0
c0t9d0   -  -189  0  11.8M  0
c0t11d0  -  -189  0  11.7M  0
---  -  -  -  -  -  -
== ~64k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vol...@now of=/dev/null bs=64k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool16.01G   272G414  0  25.7M  0
  mirror 6.01G   272G414  0  25.7M  0
c0t9d0   -  -107  0  12.9M  0
c0t11d0  -  -106  0  12.8M  0
---  -  -  -  -  -  -
== ~64k reads to pool, ~124k reads to drives


Using volblocksize of 128k:

-bash-3.00# zfs send pool1/vol1...@now  /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G188  0  23.3M  0
  mirror 4.01G   274G188  0  23.3M  0
c0t9d0   -  - 94  0  11.7M  0
c0t11d0  -  - 93  0  11.7M  0
---  -  -  -  -  -  -
== ~128k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vol1...@now of=/dev/null bs=128k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G247  0  30.8M  0
  mirror 4.01G   274G247  0  30.8M  0
c0t9d0   -  -122  0  15.3M  0
c0t11d0  -  -123  0  15.5M  0
---  -  -  -  -  -  -
== ~128k reads to pool and drives

-- 
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
  bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Cindy . Swearingen
Alex,

I think the root cause of your confusion is that the format utility and
disk labels are very unfriendly and confusing.

Partition 2 identifies the whole disk and on x86 systems, space is
needed for boot-related information and is currently stored in
partition 8. Neither of these partitions require any administration and 
should not be used for anything. You can read more here:

http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-20068?a=view

(To add more confusion, partitions are also referred to as slices.)

However, the system actually boots from the root file system, in 
partition 0 on your system, which is why you need to run the installgrub 
command on c4t0d0s0. Your installgrub syntax looks correct to me.

My wish for this year is to boot from EFI-labeled disks so examining
disk labels is mostly unnecessary because ZFS pool components could be
constructed as whole disks, and the unpleasant disk
format/label/partitioning experience is just a dim memory...

Cindy



Alex Viskovatoff wrote:
 Hi all,
 
 I did an install of OpenSolaris in which I specified that the whole disk 
 should be used for the installation. Here is what format verify produces 
 for that disk:
 
 Part  TagFlag Cylinders SizeBlocks
   0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
   1 unassignedwm   00 (0/0/0) 0
   2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
   3 unassignedwm   00 (0/0/0) 0
   4 unassignedwm   00 (0/0/0) 0
   5 unassignedwm   00 (0/0/0) 0
   6 unassignedwm   00 (0/0/0) 0
   7 unassignedwm   00 (0/0/0) 0
   8   bootwu   0 - 07.84MB(1/0/0) 16065
   9 unassignedwm   00 (0/0/0) 0
 
 I have several questions. First, what is the purpose of partitions 2 and 8 
 here? Why not simply have partition 0, the root partition, be the only 
 partition, and start at cylinder 0 as opposed to 1?
 
 My second question concerns the disk I have used to mirror the first root 
 zpool disk. After I set up the second disk to mirror the first one with 
 zpool attach -f rpool c3t0d0s0 c4t0d0s0, I got the response
 
 Please be sure to invoke installgrub(1M) to make 'c4t0d0s0' bootable.
 
 Is that correct? Or do I want to make c4t0d0s8 bootable, given that the label 
 of that partition is boot? I cannot help finding this a little confusing. 
 As far as i can tell, c4t0d0s8 (as well as c3t0d0s8 from the original disk 
 which I mirrored), cylinder 0, is not used for anything.
 
 Finally, is the correct command to make the disk I have added to mirror the 
 first disk bootable
 
 installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t0d0s0 ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
I have that iozone program loaded, but its results were rather cryptic
for me.  Is it adequate if I learn how to decipher the results?  Can
it thread out and use all of my CPUs?



 Do you have tools to do random I/O exercises?

 --
 Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send fails incremental snapshot

2009-01-06 Thread Brent Jones
On Mon, Jan 5, 2009 at 4:29 PM, Brent Jones br...@servuhome.net wrote:
 On Mon, Jan 5, 2009 at 2:50 PM, Richard Elling richard.ell...@sun.com wrote:
 Correlation question below...

 Brent Jones wrote:

 On Sun, Jan 4, 2009 at 11:33 PM, Carsten Aulbert
 carsten.aulb...@aei.mpg.de wrote:


 Hi Brent,

 Brent Jones wrote:


 I am using 2008.11 with the Timeslider automatic snapshots, and using
 it to automatically send snapshots to a remote host every 15 minutes.
 Both sides are X4540's, with the remote filesystem mounted read-only
 as I read earlier that would cause problems.
 The snapshots send fine for several days, I accumulate many snapshots
 at regular intervals, and they are sent without any problems.
 Then I will get the dreaded:
 
 cannot receive incremental stream: most recent snapshot of pdxfilu02
 does not match incremental source
 



 Which command line are you using?

 Maybe you need to do a rollback first (zfs receive -F)?

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



 I am using a command similar to this:

 zfs send -i pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:30
 pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:45 | ssh -c
 blowfish u...@host.com /sbin/zfs recv -d pdxfilu02

 It normally works, then after some time it will stop. It is still
 doing a full snapshot replication at this time (very slowly it seems,
 I'm bit by the bug of slow zfs send/resv)

 Once I get back on my regular snapshotting, if it comes out of sync
 again, I'll try doing a -F rollback and see if that helps.


 When this gets slow, are the other snapshot-related commands also
 slow?  For example, normally I see zfs list -t snapshot completing
 in a few seconds, but sometimes it takes minutes?
 -- richard



 I'm not seeing zfs related commands any slower. On the remote side, it
 builds up thousands of snapshots, and aside from SSH scrolling as fast
 as it can over the network, no other slowness.
 But the actual send and receive is getting very very slow, almost to
 the point of needing the scrap the project and find some other way to
 ship data around!

 --
 Brent Jones
 br...@servuhome.net


Got a small update on the ZFS send, I am in fact seeing ZFS list
taking several minutes to complete. I must have timed it correctly
during the send, and both sides are not completing the ZFS list, and
its been about 5 minutes already. There is a small amount of network
traffic between the two hosts, so maybe it's comparing what needs to
be sent, not sure.
I'll update when/if it completes.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

 I have that iozone program loaded, but its results were rather cryptic
 for me.  Is it adequate if I learn how to decipher the results?  Can
 it thread out and use all of my CPUs?

Yes, iozone does support threading.  Here is a test with a record size 
of 8KB, eight threads, synchronous writes, and a 2GB test file:

 Multi_buffer. Work area 16777216 bytes
 OPS Mode. Output is in operations per second.
 Record Size 8 KB
 SYNC Mode.
 File size set to 2097152 KB
 Command line used: iozone -m -t 8 -T -O -r 8k -o -s 2G
 Time Resolution = 0.01 seconds.
 Processor cache size set to 1024 Kbytes.
 Processor cache line size set to 32 bytes.
 File stride size set to 17 * record size.
 Throughput test with 8 threads
 Each thread writes a 2097152 Kbyte file in 8 Kbyte records

When testing with iozone, you will want to make sure that the test 
file is larger than available RAM, such as 2X the size.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
ZFS is the bomb. It's a great file system. What are it's real world 
applications besides solaris userspace? What I'd really like is to utilize the 
benefits of ZFS across all the platforms we use. For instance, we use Microsoft 
Windows Servers as our primary platform here. How might I utilize ZFS to 
protect that data? 

The only way I can visualize doing so would be to virtualize the windows server 
and store it's image in a ZFS pool. That would add additional overhead but 
protect the data at the disk level. It would also allow snapshots of the 
Windows Machine's virtual file. However none of these benefits would protect 
Windows from hurting it's own data, if you catch my meaning.

Obviously ZFS is ideal for large databases served out via application level or 
web servers. But what other practical ways are there to integrate the use of 
ZFS into existing setups to experience it's benefits.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Marcelo Leal
Hello,
 - One way is virtualization, if you use a virtualization technology that uses 
NFS for example, you could add your virtual images on a ZFS filesystem.  NFS 
can be used without virtualization too, but as you said the machines are 
windows, i don't think the NFS client for windows is production ready. 
 Maybe somebody else on the list can say...
 - Virtualization inside solaris branded zones... IIRC, the idea is have 
branded zones to support another OS (like GNU/Linux/ MS Windows, etc).
 - Another option is iSCSI, and you would not need virtualization.

 Leal
[http://www.eall.com.br/blog]

 ZFS is the bomb. It's a great file system. What are
 it's real world applications besides solaris
 userspace? What I'd really like is to utilize the
 benefits of ZFS across all the platforms we use. For
 instance, we use Microsoft Windows Servers as our
 primary platform here. How might I utilize ZFS to
 protect that data? 
 
 The only way I can visualize doing so would be to
 virtualize the windows server and store it's image in
 a ZFS pool. That would add additional overhead but
 protect the data at the disk level. It would also
 allow snapshots of the Windows Machine's virtual
 file. However none of these benefits would protect
 Windows from hurting it's own data, if you catch my
 meaning.
 
 Obviously ZFS is ideal for large databases served out
 via application level or web servers. But what other
 practical ways are there to integrate the use of ZFS
 into existing setups to experience it's benefits.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Rob wrote:

 The only way I can visualize doing so would be to virtualize the 
 windows server and store it's image in a ZFS pool. That would add 
 additional overhead but protect the data at the disk level. It would 
 also allow snapshots of the Windows Machine's virtual file. However 
 none of these benefits would protect Windows from hurting it's own 
 data, if you catch my meaning.

With OpenSolaris you can use its built in SMB/CIFS service so that 
files are stored natively in ZFS by the Windows client.  Since the 
files are stored natively, individual lost/damaged files can be 
retrieved from a ZFS snapshot if snapshots are configured to be taken 
periodically.

If you use iSCSI or the forthcoming COMSTAR project (iSCSI, FC target, 
FCOE) then you can create native Windows volumes and the whole volume 
could be protected via snapshots but without the ability to retrieve 
individual files.  As you say, Windows could still destroy its own 
volume.  Snapshots of iSCSI volumes will be similar to if the Windows 
system suddenly lost power at the time the snapshot was taken.

As far as ZFS portability goes, ZFS is also supported on FreeBSD, on 
Linux in an inferior mode, and soon on OS-X.  The main 
interoperability issues seem to be with the disk partitioning 
strategies used by the different operating systems.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
I am not experienced with iSCSI. I understand it's block level disk access via 
TCP/IP. However I don't see how using it eliminates the need for virtualization.

Are you saying that a Windows Server can access a ZFS drive via iSCSI and store 
NTFS files?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Rob wrote:

 Are you saying that a Windows Server can access a ZFS drive via 
 iSCSI and store NTFS files?

A volume is created under ZFS, similar to a large sequential file. 
The iSCSI protocol is used to export that volume as a LUN.  Windows 
can then format it and put NTFS on it.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 10:22:20AM -0800, Alex Viskovatoff wrote:
 I did an install of OpenSolaris in which I specified that the whole disk 
 should be used for the installation. Here is what format verify produces 
 for that disk:
 
 Part  TagFlag Cylinders SizeBlocks
   0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
   1 unassignedwm   00 (0/0/0) 0
   2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
   3 unassignedwm   00 (0/0/0) 0
   4 unassignedwm   00 (0/0/0) 0
   5 unassignedwm   00 (0/0/0) 0
   6 unassignedwm   00 (0/0/0) 0
   7 unassignedwm   00 (0/0/0) 0
   8   bootwu   0 - 07.84MB(1/0/0) 16065
   9 unassignedwm   00 (0/0/0) 0
 
 I have several questions. First, what is the purpose of partitions 2 and 8 
 here? Why not simply have partition 0, the root partition, be the only 
 partition, and start at cylinder 0 as opposed to 1?

It's traditional in the VTOC label to have slice 2 encompass all
cylinders.  You don't have to use it.

In SPARC, the boot blocks fit into the 15 free blocks before the
filesystem actually starts writing data.  On x86, the boot code requires
more data.  So putting a UFS filesystem on cylinder 0 would not leave
sufficient room for boot code.  The traditional solution is that data
slices on x86 begin at cylinder 1, leaving cylinder 0 for boot data.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Volker A. Brandt
 http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-20068?a=view

 (To add more confusion, partitions are also referred to as slices.)

Nope, at least not on x86 systems.  A partition holds the Solaris part
of the disk, and that part is subdivided into slices.  Partitions
are visible to other OSes on the box; slices aren't.  Whereever the
wrong term appears in Sun docs, it should be treated as a doc bug.

For Sparc systems, some people intermix the two terms, but it's not
really correct there either.


Regards -- Volker
-- 

Volker A. Brandt  Consulting and Support for Sun Solaris
Brandt  Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 45
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs create performance degrades dramatically with increasing number of file systems

2009-01-06 Thread Alastair Neil
On Mon, Jan 5, 2009 at 5:27 AM, Roch roch.bourbonn...@sun.com wrote:
 Alastair Neil writes:
   I am attempting to create approx 10600 zfs file systems across two
   pools.  The devices underlying the pools are mirrored iscsi volumes
   shared over a dedicated gigabit Ethernet with jumbo frames enabled
   (MTU 9000) from a Linux Openfiler 2.3 system. I have added a couple of
   4GByte  zvols from the root pool to use as zil devices for the two
   iscsi backed pools.
  
   The solaris system is running snv_101b, has 8 Gbyte RAM and dual 64
   bit Xeon processors.
  
   Initially I was able to create zfs file systems at a rate of around 1
   every 6 seconds (wall clock time),
   now three days later I have created
   9000 zfs file systems and the creation rate has dropped to approx  1
   per minute, an order of magnitude slower.
  
   I attempted an experiment on a system with half the memory and using
   looped back zvols and saw similar performance.
  
   I find it hard to believe that such a performance degradation is
   expected.  Is there some parameters I should be tuning for using large
   numbers of file systems?
  
   Regards, Alastair
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 Sounds like

 6763592 : creating zfs filesystems gets slower as the number of zfs 
 filesystems increase
 6572357 : libzfs should do more to avoid mnttab lookups

 Which  just integrated in SNV 105.

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6763592
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6572357

 -r



Roch

thanks for the information, I replied to you directly by accident,
sorry about that.  I am curious what is the update process for
opensolaris.  I installed my machines almost the instant 2008.11 was
available and yet so far there have been no released updates - unless
I am doing something wrong.  Will updates from the SNV releases be
periodically rolled into 2008.11?  Am I out of luck with 2008.11 and
have to wait for 2009.04 for these fixes?  I guess I could wait and
upgrade to a Developers edition release?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 11:49:27AM -0700, cindy.swearin...@sun.com wrote:
 My wish for this year is to boot from EFI-labeled disks so examining
 disk labels is mostly unnecessary because ZFS pool components could be
 constructed as whole disks, and the unpleasant disk
 format/label/partitioning experience is just a dim memory...

Is there any non-EFI hardware that would support EFI boot?

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
[ok, no one replying, my spam then...]

Open folks just care about SMART so far.
http://www.mail-archive.com/linux-s...@vger.kernel.org/msg07346.html

Enterprise folks care more about spin-down.
(not an open thing yet, unless new practical industry standard is here that 
I don't know. yeah right.)

best,
z

- Original Message - 
From: Anton B. Rang r...@acm.org
To: zfs-discuss@opensolaris.org
Sent: Tuesday, January 06, 2009 9:07 AM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


 For SCSI disks (including FC), you would use the FUA bit on the read 
 command.

 For SATA disks ... does anyone care?  ;-)
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
 Yes, iozone does support threading.  Here is a test with a record size of
 8KB, eight threads, synchronous writes, and a 2GB test file:

Multi_buffer. Work area 16777216 bytes
OPS Mode. Output is in operations per second.
Record Size 8 KB
SYNC Mode.
File size set to 2097152 KB
Command line used: iozone -m -t 8 -T -O -r 8k -o -s 2G
Time Resolution = 0.01 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 8 threads
Each thread writes a 2097152 Kbyte file in 8 Kbyte records

 When testing with iozone, you will want to make sure that the test file is
 larger than available RAM, such as 2X the size.

 Bob


OK, I ran it as suggested (using a 17GB file pre-generated from
urandom) and I'm getting what appear to be sane iozone results now.
Do we have a place to compare performance notes?

thx
jake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
Wow. I will read further into this. That seems like it could have great 
applications. I assume the same is true of FCoE?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread JZ
Hello Darren,
This one, ok, was a validate thought/question --

On Solaris, root pools cannot have EFI labels (the boot firmware doesn't 
support booting from them).
http://blog.yucas.info/2008/11/26/zfs-boot-solaris/

But again, this is a ZFS discussion, and obvously EFI is not a ZFS, or even 
Sun thing.
http://en.wikipedia.org/wiki/Extensible_Firmware_Interface

Hence, on ZFS turf, I would offer the following comment, in the notion of 
innovation, not trashing EFI.
The ZFS design point is to make IT simple, so EFI or not EFI can be 
debate-able.
http://kerneltrap.org/node/6884

;-)
best,
z

- Original Message - 
From: A Darren Dunham ddun...@taos.com
To: zfs-discuss@opensolaris.org
Sent: Tuesday, January 06, 2009 3:38 PM
Subject: Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme


 On Tue, Jan 06, 2009 at 11:49:27AM -0700, cindy.swearin...@sun.com wrote:
 My wish for this year is to boot from EFI-labeled disks so examining
 disk labels is mostly unnecessary because ZFS pool components could be
 constructed as whole disks, and the unpleasant disk
 format/label/partitioning experience is just a dim memory...

 Is there any non-EFI hardware that would support EFI boot?

 -- 
 Darren
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Cindy,

Well, it worked. The system can boot off c4t0d0s0 now.

But I am still a bit perplexed. Here is how the invocation of installgrub went:

a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
/dev/rdsk/c4t0d0s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector
a...@diotima:~# 

So installgrub writes to partition 0. How does one know that those sectors have 
not already been used by zfs, in its mirroring of the first drive by this 
second drive? And why is writing to partition 0 even necessary? Since c3t0d0 
must contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 already have 
stage1 and stage 2 in its partition 0 through the silvering process?

I don't find the present disk format/label/partitioning experience particularly 
unpleasant (except for grubinstall writing directly into a partition which 
belongs to a zpool). I just wish I understood what it involves.

Thank you for that link to the System Administration Guide. I just looked at it 
again, and it says partition 8 Contains GRUB boot information. So partition 8 
is the master boot sector and contains GRUB stage1?

Alex
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Peter Skovgaard Nielsen
I am running a test system with Solaris 10u6 and I am somewhat confused as to 
how ACE inheritance works. I've read through 
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf but it doesn't seem 
to cover what I am experiencing.

The ZFS file system that I am working on has both aclmode and aclinherit set to 
passthrough, which I thought would result in the ACEs being just that - passed 
through without modification.

In my test scenario, I am creating a folder, removing all ACEs and adding a 
single full access allow ACE with file and directory inheritance for one user:

 mkdir test
 chmod A=user:root:rwxpdDaARWcCos:fd:allow test

Permission check:
 ls -dV test
d-+  2 root  root  2 Jan  6 21:17 test
  user:root:rwxpdDaARWcCos:fd:allow

Ok, that seems to be as I intended. Now I cd into the folder and create a file:

 cd test
 touch file

Permission check:

 ls -V file
-rw-r--r--+  1 root root   0 Jan  6 21:42 d
 user:root:rwxpdDaARWcCos:--:allow
owner@:--x---:--:deny
owner@:rw-p---A-W-Co-:--:allow
group@:-wxp--:--:deny
group@:r-:--:allow
 everyone@:-wxp---A-W-Co-:--:deny
 everyone@:r-a-R-c--s:--:allow

Can anyone explain to me what just happened? Why are owner/group/everyone ACEs 
(corresponding to old fashioned POSIX permission bits) created and even more 
strange, why are deny entries created? Is there something mandating the 
creation of these ACEs? I can understand that umask might affect this, but I 
wouldn't have though that it would be causing ACEs to appear out of the blue.

While writing this, I stumbled into this thread: http://tinyurl.com/7ofxfj. Ok, 
so it seems that this is intended behavior to comply with POSIX. As the 
author of the thread mentioned, I would like to see an inheritance mode that 
completely ignores POSIX. The thread ends with Mark Shellenbaum commenting that 
he will fasttrack the behavior that many people want. It is not clear to me 
what exactly he means by this.

Then I found http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=zha=view and 
much to my confusion, the deny ACEs aren't created in example 8-10. How could 
this be? Following some playing around, I came to the conclusion that as long 
as at least one ACE corresponding to owner/group/everyone exists, the extra 
ACEs aren't created:

 mkdir test
 chmod A=user:root:rwxpdDaARWcCos:fd:allow,everyone@::fd:allow test
 ls -dV test
d-+  3 root root  15 Jan  6 22:11 test
 user:root:rwxpdDaARWcCos:fd:allow
 everyone@:--:fd:allow

 cd test
 touch file
 ls -V file
--+  1 root root   0 Jan  6 22:15 file
 user:root:rwxpdDaARWcCos:--:allow
 everyone@:--:--:allow

Not bad at all. However, I contend that this shouldn't be necessary - and I 
don't understand why the inclusion of just one POSIX ACE (empty to boot) 
makes things work as expected.

/Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Cindy . Swearingen
Hi Alex,

The fact that you have to install the boot blocks manually on the
second disk that you added with zpool attach is a bug! I should have
mentioned this bug previously.

If you had used the initial installation method to create a mirrored
root pool, the boot blocks would have been applied automatically.

I don't think a way exists to discern whether the boot blocks are
already applied. I can't comment on why resilvering can't do this step.

Cindy

Alex Viskovatoff wrote:
 Cindy,
 
 Well, it worked. The system can boot off c4t0d0s0 now.
 
 But I am still a bit perplexed. Here is how the invocation of installgrub 
 went:
 
 a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
 /dev/rdsk/c4t0d0s0
 Updating master boot sector destroys existing boot managers (if any).
 continue (y/n)?y
 stage1 written to partition 0 sector 0 (abs 16065)
 stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
 stage1 written to master boot sector
 a...@diotima:~# 
 
 So installgrub writes to partition 0. How does one know that those sectors 
 have not already been used by zfs, in its mirroring of the first drive by 
 this second drive? And why is writing to partition 0 even necessary? Since 
 c3t0d0 must contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 
 already have stage1 and stage 2 in its partition 0 through the silvering 
 process?
 
 I don't find the present disk format/label/partitioning experience 
 particularly unpleasant (except for grubinstall writing directly into a 
 partition which belongs to a zpool). I just wish I understood what it 
 involves.
 
 Thank you for that link to the System Administration Guide. I just looked at 
 it again, and it says partition 8 Contains GRUB boot information. So 
 partition 8 is the master boot sector and contains GRUB stage1?
 
 Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Mark Shellenbaum
   ls -V file
 -rw-r--r--+  1 root root   0 Jan  6 21:42 d
  user:root:rwxpdDaARWcCos:--:allow
 owner@:--x---:--:deny
 owner@:rw-p---A-W-Co-:--:allow
 group@:-wxp--:--:deny
 group@:r-:--:allow
  everyone@:-wxp---A-W-Co-:--:deny
  everyone@:r-a-R-c--s:--:allow
 
 Can anyone explain to me what just happened? Why are owner/group/everyone 
 ACEs (corresponding to old fashioned POSIX permission bits) created and even 
 more strange, why are deny entries created? Is there something mandating the 
 creation of these ACEs? I can understand that umask might affect this, but I 
 wouldn't have though that it would be causing ACEs to appear out of the blue.
 
 While writing this, I stumbled into this thread: http://tinyurl.com/7ofxfj. 
 Ok, so it seems that this is intended behavior to comply with POSIX. As the 
 author of the thread mentioned, I would like to see an inheritance mode that 
 completely ignores POSIX. The thread ends with Mark Shellenbaum commenting 
 that he will fasttrack the behavior that many people want. It is not clear 
 to me what exactly he means by this.
 
 Then I found http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=zha=view and 
 much to my confusion, the deny ACEs aren't created in example 8-10. How could 
 this be? Following some playing around, I came to the conclusion that as long 
 as at least one ACE corresponding to owner/group/everyone exists, the extra 
 ACEs aren't created:
 

The requested mode from an application is only ignored if the directory 
has inheritable ACES that would affect the mode when aclinherit is set 
to passthrough.  Otherwise the mode is honored and you get the 
owner@,group@ and everyone@ ACEs.


The way a file create works is something like this

1. build up ACL based on inherited ACEs from parent.  This ACL will
often be emtpy when no inheritable ACEs exists.

2. Next the chmod algorithm is applied the the ACL in order to make it
reflect the requested mode.  This step is bypassed if the ACL created
in step 1 had any ACEs that affect the mode and aclinherit property
is set to passthrough.

 mkdir test
 chmod A=user:root:rwxpdDaARWcCos:fd:allow,everyone@::fd:allow test
 ls -dV test
 d-+  3 root root  15 Jan  6 22:11 test
  user:root:rwxpdDaARWcCos:fd:allow
  everyone@:--:fd:allow
 
 cd test
 touch file
 ls -V file
 --+  1 root root   0 Jan  6 22:15 file
  user:root:rwxpdDaARWcCos:--:allow
  everyone@:--:--:allow
 
 Not bad at all. However, I contend that this shouldn't be necessary - and I 
 don't understand why the inclusion of just one POSIX ACE (empty to boot) 
 makes things work as expected.
 
 /Peter

We don't have a mode that says completely disregard POSIX.  We try to 
honor the applications mode request except in situations where the 
inherited ACEs would conflict with the requested mode from the application.


-Mark

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Nicolas Williams
On Tue, Jan 06, 2009 at 01:27:41PM -0800, Peter Skovgaard Nielsen wrote:
  ls -V file
 --+  1 root root   0 Jan  6 22:15 file
  user:root:rwxpdDaARWcCos:--:allow
  everyone@:--:--:allow
 
 Not bad at all. However, I contend that this shouldn't be necessary -
 and I don't understand why the inclusion of just one POSIX ACE
 (empty to boot) makes things work as expected.

Because, IIRC, NFSv4 ACLs have an ambiguity, as to what happens if no
ACE matches the subject that's trying to access the resource.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Sam
I've run into this problem twice now, before I had 10x500GB drives in a ZFS+ 
setup and now again in a 12x500GB ZFS+ setup.

The problem is when the pool reaches ~85% capacity I get random read failures 
and around ~90% capacity I get read failures AND zpool corruption.  For example:

-I open a directory that I know for a fact has files and folders in it but it 
either shows 0 items or hangs on a directory listing
-I try to copy a file from the zpool volume to another volume and it hangs then 
fails

In both these situations if I do a 'zpool status' after the fact it claims that 
the volume has experienced an unrecoverable error and I should find the faulty 
drive and replace it blah blah.  If I do a 'zpool scrub' it eventually reports 
0 faults or error, also if I restart the machine it will usually work jut fine 
again (ie I can read the directory and copy files again).

Is this a systemic problem at 90% capacity or do I perhaps have a faulty drive 
in the array that only gets hit at 90%?  If it is a faulty drive why does 
'zpool status' report fully good health, that makes it hard to find the problem 
drive?

Thanks,
Sam
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Thanks for clearing that up. That all makes sense.

I was wondering why ZFS doesn't use the whole disk in the standard OpenSolaris 
install. That explains it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 01:24:17PM -0800, Alex Viskovatoff wrote:

 a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
 /dev/rdsk/c4t0d0s0
 Updating master boot sector destroys existing boot managers (if any).
 continue (y/n)?y
 stage1 written to partition 0 sector 0 (abs 16065)
 stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
 stage1 written to master boot sector
 a...@diotima:~# 

 So installgrub writes to partition 0. How does one know that those
 sectors have not already been used by zfs, in its mirroring of the
 first drive by this second drive?

Because this is a VTOC label partition (necessary for Solaris boot), I
believe ZFS lives only within slice 0 (I need to verify this).  So VTOC
cylinder 0 is free.

 And why is writing to partition 0 even necessary? Since c3t0d0 must
 contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 already
 have stage1 and stage 2 in its partition 0 through the silvering
 process?

I doubt that all of partition 0 is mirrored.  Only the data under ZFS
control (not the boot blocks) are copied.

Do not confuse the MBR partition 0 (the Solaris partition) with the
VTOC slice 0 (the one that has all the disk cylinders other than
cylinder 0 in it and that appeared in your earlier label output).

 I don't find the present disk format/label/partitioning experience
 particularly unpleasant (except for grubinstall writing directly into
 a partition which belongs to a zpool). I just wish I understood what
 it involves.

Partition 0 contains all of Solaris.  So the OS just needs to keep
things straight.  It does this with the VTOC slicing within that
partition.

 Thank you for that link to the System Administration Guide. I just
 looked at it again, and it says partition 8 Contains GRUB boot
 information. So partition 8 is the master boot sector and contains
 GRUB stage1?

It should probably refer to slice 8 to reduce confusion.  Boot loaders
(GRUB since that's what is in use here) are simultaneously in partition
0 and slice 8.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Hi Cindy,

I now suspect that the boot blocks are located outside of the space in 
partition 0 that actually belongs to the zpool, in which case it is not 
necessarily a bug that zpool attach does not write those blocks, IMO. Indeed, 
that must be the case, since GRUB needs to get to stage2 in order to be able to 
read zfs file systems. I'm just glad zpool attach warned me that I need to 
invoke grubinstall manually!

Thank you for making things less mysterious.

Alex
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 04:10:10PM -0500, JZ wrote:
 Hello Darren,
 This one, ok, was a validate thought/question --

Darn, I was hoping...

 On Solaris, root pools cannot have EFI labels (the boot firmware doesn't  
 support booting from them).
 http://blog.yucas.info/2008/11/26/zfs-boot-solaris/

Yup.  If that were to change, it would make this much simpler.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 2:58 PM, Rob rdyl...@yahoo.com wrote:

 Wow. I will read further into this. That seems like it could have great
 applications. I assume the same is true of FCoE?
 --


Yes, iSCSI, FC, FCOE all present out a LUN to Windows.  For the layman, from
the windows system the disk will look identical to a SCSI disk plugged
directly into the motherboard.  That's not entirely accurate, but close
enough for you to get an idea.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
Ok, folks, new news -  [feel free to comment in any fashion, since I don't 
know how yet.]


EMC ACQUIRES OPEN-SOURCE ASSETS FROM SOURCELABS 
http://go.techtarget.com/r/5490612/6109175






attachment: joetucci.jpg___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Brent Jones
On Sat, Dec 6, 2008 at 11:40 AM, Ian Collins i...@ianshome.com wrote:
 Richard Elling wrote:
 Ian Collins wrote:
 Ian Collins wrote:
 Andrew Gabriel wrote:
 Ian Collins wrote:
 I've just finished a small application to couple zfs_send and
 zfs_receive through a socket to remove ssh from the equation and the
 speed up is better than 2x.  I have a small (140K) buffer on the
 sending
 side to ensure the minimum number of sent packets

 The times I get for 3.1GB of data (b101 ISO and some smaller
 files) to a
 modest mirror at the receive end are:

 1m36s for cp over NFS,
 2m48s for zfs send though ssh and
 1m14s through a socket.

 So the best speed is equivalent to 42MB/s.
 It would be interesting to try putting a buffer (5 x 42MB = 210MB
 initial stab) at the recv side and see if you get any improvement.

 It took a while...

 I was able to get about 47MB/s with a 256MB circular input buffer. I
 think that's about as fast it can go, the buffer fills so receive
 processing is the bottleneck.  Bonnie++ shows the pool (a mirror) block
 write speed is 58MB/s.

 When I reverse the transfer to the faster box, the rate drops to 35MB/s
 with neither the send nor receive buffer filling.  So send processing
 appears to be the limit in this case.
 Those rates are what I would expect writing to a single disk.
 How is the pool configured?

 The slow system has a single mirror pool of two SATA drives, the
 faster one a stripe of 4 mirrors and an IDE SD boot drive.

 ZFS send though ssh from the slow to the fast box takes 189 seconds, the
 direct socket connection send takes 82 seconds.

 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Reviving an old discussion, but has the core issue been addressed in
regards to zfs send/recv performance issues? I'm not able to find any
new bug reports on bugs.opensolaris.org related to this, but my search
kung-fu may be weak.

Using mbuffer can speed it up dramatically, but this seems like a hack
without addressing a real problem with zfs send/recv.
Trying to send any meaningful sized snapshots from say an X4540 takes
up to 24 hours, for as little as 300GB changerate.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Orvar Korvar
It is not recommended to store more than 90% on any file system, I think. For 
instance, NTFS can behave very badly when it runs out of space. Similar to if 
you fill up your RAM and you have no swap space. Then the computer starts to 
thrash badly. Not recommended. Avoid 90% and above, and you have eliminated a 
possible source of problems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Sam
I was hoping that this was the problem (because just buying more discs is the 
cheapest solution given time=$$) but running it by somebody at work they said 
going over 90% can cause decreased performance but is unlikely to cause the 
strange errors I'm seeing.  However, I think I'll stick a 1TB drive in as a new 
volume and pull some data onto it to bring the zpool down to 75% capacity and 
see if that helps though anyway.  Probably update the OS to 2008.11 as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Carson Gaspar
On 1/6/2009 4:19 PM, Sam wrote:
 I was hoping that this was the problem (because just buying more
 discs is the cheapest solution given time=$$) but running it by
 somebody at work they said going over 90% can cause decreased
 performance but is unlikely to cause the strange errors I'm seeing.
 However, I think I'll stick a 1TB drive in as a new volume and pull
 some data onto it to bring the zpool down to75% capacity and see if
 that helps though anyway.  Probably update the OS to 2008.11 as
 well.

Pool corruption is _always_ a bug. It may be ZFS, or your block devices,
but something is broken

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread David Magda
On Jan 6, 2009, at 14:21, Rob wrote:

 Obviously ZFS is ideal for large databases served out via  
 application level or web servers. But what other practical ways are  
 there to integrate the use of ZFS into existing setups to experience  
 it's benefits.

Remember that ZFS is made up of the ZPL and the DMU (amongst other  
things). The ZPL is the POSIX compatibility layer that most of us use.  
The DMU is the actual transactional object model that stores the  
actual data objects (e.g. files).

It would technically be possible for (say) MySQL to create a database  
engine on top of that transactional store. I believe that the Lustre  
people are using the DMU for their future data store back end. The DMU  
runs in userland so anyone can use it for any object store system.

People keep talking about ZFS in the context of replacing UFS/FFS,  
ext3, WAFL, etc., but few are utilizing (or realize the availability  
of) the transactional store.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 6:19 PM, Sam s...@smugmug.com wrote:

 I was hoping that this was the problem (because just buying more discs is
 the cheapest solution given time=$$) but running it by somebody at work they
 said going over 90% can cause decreased performance but is unlikely to cause
 the strange errors I'm seeing.  However, I think I'll stick a 1TB drive in
 as a new volume and pull some data onto it to bring the zpool down to 75%
 capacity and see if that helps though anyway.  Probably update the OS to
 2008.11 as well.
 --



Uhh, I would never accept that one as a solution.  90% full or not a READ
should never, ever, ever corrupt a pool.  Heck, a write shouldn't either.  I
could see the system falling over and puking on itself performance wise, but
corruption?  No way.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Nicholas Lee
Since zfs is so smart is other areas is there a particular reason why a high
water mark is not calculated and the available space not reset to this?
I'd far rather have a zpool of 1000GB that said it only had 900GB but did
not have corruption as it ran out of space.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 10:25 PM, Nicholas Lee emptysa...@gmail.com wrote:

 Since zfs is so smart is other areas is there a particular reason why a
 high water mark is not calculated and the available space not reset to this?
 I'd far rather have a zpool of 1000GB that said it only had 900GB but did
 not have corruption as it ran out of space.

 Nicholas



WHAT??!?  Put artificial limits in place to prevent users from killing
themselves?  How did that go Jeff?

I suggest that you retire to the safety of the rubber room while the rest
of us enjoy these zfs features. By the same measures, you would advocate
that people should never be allowed to go outside due to the wide open
spaces.  Perhaps people will wander outside their homes and forget how to
make it back.  Or perhaps there will be gravity failure and some of the
people outside will be lost in space.

It's NEVER a good idea to put a default limitation in place to protect a
*regular user*.  If they can't RTFM from front cover to back they don't
deserve to use a computer.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread JZ
BTW, high water mark method is not perfect, here is some for Novell support of 
water mark...
best,
z

http://www.novell.com/coolsolutions/tools/16991.html

Based on my own belief that there had to be a better way and the number of 
issues I'd seen reported in the Support Forums, I spent a lot of time 
researching how different memory settings affect the memory management and 
stability of the server. Based on that research I've made memory tuning 
recommendations to a large number of forum posters who were having memory 
tuning issues, and most of them have found their servers to be significantly 
more stable since applying the changes I recommended.

What follows are the formulas I developed for recommending memory tuning 
changes to a server. The formulas take a number of the values available from 
SEG.NLM (available from: http://www.novell.com/coolsolutions/tools/14445.html). 
To get the required values, load SEG.NLM, then from the main screen do '/', 
then 'Info', then 'Write SEGSTATS.TXT'. The SEGSTATS.TXT file will be created 
in SYS:SYSTEM.

SEG monitors the server and records a number of key memory statistics, my 
formulae take those statistics and recommend manual memory tuning parameters.

Note that as these are manual settings, auto tuning is disabled, and if the 
memory usage of the server changes significantly, then the server will need to 
be retuned to reflect the change in memory usage.

Also, after making the changes to use manual rather than auto tuning, the 
server may still recommend that the FCMS and -u memory settings be changed. 
These recommendations can be ignored. Following them will have the same effect 
as auto tuning, except you're doing it rather than the server doing it 
automatically - the same problems will still occur.

  - Original Message - 
  From: Tim 
  To: Nicholas Lee 
  Cc: zfs-discuss@opensolaris.org ; Sam 
  Sent: Wednesday, January 07, 2009 12:02 AM
  Subject: Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05





  On Tue, Jan 6, 2009 at 10:25 PM, Nicholas Lee emptysa...@gmail.com wrote:

Since zfs is so smart is other areas is there a particular reason why a 
high water mark is not calculated and the available space not reset to this?


I'd far rather have a zpool of 1000GB that said it only had 900GB but did 
not have corruption as it ran out of space.


Nicholas


  WHAT??!?  Put artificial limits in place to prevent users from killing 
themselves?  How did that go Jeff?

  I suggest that you retire to the safety of the rubber room while the rest of 
us enjoy these zfs features. By the same measures, you would advocate that 
people should never be allowed to go outside due to the wide open spaces.  
Perhaps people will wander outside their homes and forget how to make it back.  
Or perhaps there will be gravity failure and some of the people outside will be 
lost in space.

  It's NEVER a good idea to put a default limitation in place to protect a 
*regular user*.  If they can't RTFM from front cover to back they don't deserve 
to use a computer.

  --Tim



--


  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Neil Perrin


On 01/06/09 21:25, Nicholas Lee wrote:
 Since zfs is so smart is other areas is there a particular reason why a 
 high water mark is not calculated and the available space not reset to this?
 
 I'd far rather have a zpool of 1000GB that said it only had 900GB but 
 did not have corruption as it ran out of space.
 
 Nicholas

Is there any evidence of corruption at high capacity or just
a lack of performance? All file systems will slow down when
near capacity, as they struggle to find space and then have to
spread writes over the disk. Our priorities are integrity first,
followed somewhere by performance.

I vaguely remember a time when UFS had limits to prevent
ordinary users from consuming past a certain limit, allowing
only the super-user to use it. Not that I'm advocating that
approach for ZFS.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Carsten Aulbert
Hi,

Brent Jones wrote:
 
 Using mbuffer can speed it up dramatically, but this seems like a hack
 without addressing a real problem with zfs send/recv.
 Trying to send any meaningful sized snapshots from say an X4540 takes
 up to 24 hours, for as little as 300GB changerate.

I have not found a solution yet also. But it seems to depend highly on
the distribution of file sizes, number of files per directory or
whatever. The last tests I made showed still more than 50 hours for 700
GB and ~45 hours for 5 TB (both tests were null tests where zfs send
wrote to /dev/null).

Cheers from a still puzzled Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss