date:20110110

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> ~= 5.1E-57

Bah.  My math is wrong.  I was never very good at P&S.  I'll ask someone at
work tomorrow to look at it and show me the folly.  Wikipedia has it right,
but I can't evaluate numbers to the few-hundredth power in any calculator
that I have handy.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Size of incremental stream

2011-01-10 Thread fred

No compression, no dedup.

I also forgot to mention it's on svn_134
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Size of incremental stream

2011-01-10 Thread Ian Collins


 On 01/11/11 11:40 AM, fred wrote:

Hello,

I'm having a weird issue with my incremental setup.

Here is the filesystem as it shows up with zfs list:

NAMEUSED  AVAIL  REFER  MOUNTPOINT
Data/FS1   771M  16.1T   116M  /Data/FS1
Data/f...@05 10.3G  -  1.93T  -
Data/f...@06 14.7G  -  1.93T  -
Data/f...@070  -  1.93T  -

Everyday, i sync this filesystem remotely with : zfs send -I X Y | ssh 
b...@blah zfs receive Z

Now, i'm having hard time transferring @06 to @07. So i tried to copy the 
stream directly on the local filesystem to find out that the size of the stream 
was more than 50G!

Anyone know why my stream is way bigger than the actual snapshot size (14.7G)? 
I don't have this problem on my others filesystems.


Compression?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Size of incremental stream

2011-01-10 Thread fred

Hello,

I'm having a weird issue with my incremental setup.

Here is the filesystem as it shows up with zfs list:

NAMEUSED  AVAIL  REFER  MOUNTPOINT
Data/FS1   771M  16.1T   116M  /Data/FS1
Data/f...@05 10.3G  -  1.93T  -
Data/f...@06 14.7G  -  1.93T  -
Data/f...@070  -  1.93T  -

Everyday, i sync this filesystem remotely with : zfs send -I X Y | ssh 
b...@blah zfs receive Z

Now, i'm having hard time transferring @06 to @07. So i tried to copy the 
stream directly on the local filesystem to find out that the size of the stream 
was more than 50G!

Anyone know why my stream is way bigger than the actual snapshot size (14.7G)? 
I don't have this problem on my others filesystems.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] pool metadata corrupted - any options?

2011-01-10 Thread Roy Sigurd Karlsbakk

- Original Message -
> Running "zpool status -x" gives the results below. Do I have any
> options besides restoring from tape?
> 
> David
> 
> $ zpool status -x
...

This may be a little off-topic, but using 20 drives in a single VDEV - isn't 
that a little more than recommended?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS root backup/"disaster" recovery, and moving root pool

2011-01-10 Thread Cindy Swearingen


Hi Karl,

I would keep your mirrored root pool separate on the smaller disks as
you have setup now.

You can move your root pool, its easy enough. You can even replace
or attach larger disks to the root pool and detach the smaller disks.

You can't currently boot from snapshots, you must boot from a BE. Root
pool recovery is generally restoring root pool snapshots so if you store
those remotely, you should be covered. This process is described in the
ZFS Admin Guide or the ZFS troubleshooting wiki.

Combining your root pool with a ZIL and L2ARC on faster disks is not
worth the headaches that can occur when trying to manage all 3 on
the same disk. For example, if you decide to reinstall and accidentally
clobber the contents of the ZIL for your data pool. Don't share disks
for pool components or across pools to keep management and recovery
simple.

Thanks,

Cindy


On 01/10/11 10:58, Karl Wagner wrote:

Hi everyone

 

I am currently testing Solaris 11 Express. I currently have a root pool 
on a mirrored pair of small disks, and a data pool consisting of 2 
mirrored pairs of 1.5TB drives.


 

I have enabled auto snapshots on my root pool, and plan to archive the 
daily snapshots onto my data pool. I was wondering how easy it would be, 
in the case of a root pool failure (i.e. both disks giving up the 
ghost), to restore these backups to a new disk? Or even if it would be 
possible to boot from the latest snapshot, somehow?


 

In a related topic, how easy is it to move a root pool? I am considering 
getting a pair of SSDs to use for ZIL, L2ARC and root pool, but am 
rather worried it will be quite a painful process to move the root pool 
onto them. The plan is to use 16GB or so for rpool, mirrored, then 
divide the rest between L2ARC and a mirrored ZIL, on 64GB SSDs.


 


Cheers in advance

Karl




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS root backup/"disaster" recovery, and moving root pool

2011-01-10 Thread Karl Wagner

Hi everyone

 

I am currently testing Solaris 11 Express. I currently have a root pool on a
mirrored pair of small disks, and a data pool consisting of 2 mirrored pairs
of 1.5TB drives.

 

I have enabled auto snapshots on my root pool, and plan to archive the daily
snapshots onto my data pool. I was wondering how easy it would be, in the
case of a root pool failure (i.e. both disks giving up the ghost), to
restore these backups to a new disk? Or even if it would be possible to boot
from the latest snapshot, somehow?

 

In a related topic, how easy is it to move a root pool? I am considering
getting a pair of SSDs to use for ZIL, L2ARC and root pool, but am rather
worried it will be quite a painful process to move the root pool onto them.
The plan is to use 16GB or so for rpool, mirrored, then divide the rest
between L2ARC and a mirrored ZIL, on 64GB SSDs.

 

Cheers in advance

Karl 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] pool metadata corrupted - any options?

2011-01-10 Thread Cindy Swearingen


Hi David,

You might try importing this pool on a Oracle Solaris Express system,
where a pool recovery feature is available might be able to bring this
pool back (it rolls back to a previous transaction) or if that fails,
you could import this pool by using the read-only option to at least
recover your data.

What events led up to this corruption?

Thanks,

Cindy

On 01/08/11 11:57, David Stein wrote:

Running "zpool status -x" gives the results below.  Do I have any
options besides restoring from tape?

David

$ zpool status -x
  pool: home
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-72
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
home   FAULTED  0 0 1  corrupted data
  raidz2 ONLINE   0 0 6
c0t10d0  ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
c0t14d0  ONLINE   0 0 0
c0t15d0  ONLINE   0 0 0
c0t16d0  ONLINE   0 0 0
c0t17d0  ONLINE   0 0 0
c0t18d0  ONLINE   0 0 0
c0t19d0  ONLINE   0 0 0
c0t20d0  ONLINE   0 0 0
c0t21d0  ONLINE   0 0 1
c0t22d0  ONLINE   0 0 0
c0t23d0  ONLINE   0 0 0
c0t2d0   ONLINE   0 0 0
c0t3d0   ONLINE   0 0 0
c0t4d0   ONLINE   0 0 0
c0t5d0   ONLINE   0 0 0
c0t6d0   ONLINE   0 0 0
c0t7d0   ONLINE   0 0 0
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e

2011-01-10 Thread Rob Cohen

As a follow-up, I tried a SuperMicro enclosure (SC847E26-RJBOD1).  I have 3 
sets of 15 drives.  I got the same results when I loaded the second set of 
drives (15 to 30).

Then, I tried changing the LSI 9200's BIOS setting for max INT 13 drives from 
24 (the default) to 15.  From then on, the SuperMicro enclosure worked fine, 
even with all 45 drives, and no kernel hangs.

I suspect that the BIOS setting would have worked with >1 MD1000 enclosure, but 
I never tested the MD1000s, after I had the SuperMicro enclosure running.

I'm not sure if the kernal hang with max int13=24 was a hardware problem, or a 
Solaris bug.
  - Rob

> I have 15x SAS drives in a Dell MD1000 enclosure,
> attached to an LSI 9200-16e.  This has been working
> well.  The system is boothing off of internal drives,
> on a Dell SAS 6ir.
> 
> I just tried to add a second storage enclosure, with
> 15 more SAS drives, and I got a lockup during Loading
> Kernel.  I got the same results, whether I daisy
> chained the enclosures, or plugged them both directly
> into the LSI 9200.  When I removed the second
> enclosure, it booted up fine.
> 
> I also have an LSI MegaRAID 9280-8e I could use, but
> I don't know if there is a way to pass the drives
> through, without creating RAID0 virtual drives for
> each drive, which would complicate replacing disks.
> The 9280 boots up fine, and the systems can see new
>  virtual drives.
> 
> Any suggestions?  Is there some sort of boot
> procedure, in order to get the system to recognize
> the second enclosure without locking up?  Is there a
> special way to configure one of these LSI boards?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on emcpower0a and labels

2011-01-10 Thread Philip

Hi David,

Don't know whether my info is still helpfull, but here it is anyway.

Had the same problem and solved it using the format -e command.

When you then enter the label option, you will get two options.
format> label
[0] SMI Label
[1] EFI Label
Specify Label type[0]:

Choose zero and your disk will be a "SUN" disk again.

Grtz, Philip.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of David Magda
>
> Knowing exactly how the math (?) works is not necessary, but understanding

Understanding the math is not necessary, but it is pretty easy.  And
unfortunately it becomes kind of necessary because even when you tell
somebody the odds of a collision are a zillion times smaller than the odds
of our Sun exploding and destroying Earth, they still don't believe you.

The explanation of the math, again, is described in the wikipedia article
"Birthday Problem," or stated a little more simply here:

Given a finite pool of N items, pick one at random and return it to the
pool.
Pick another one.  The odds of it being the same as the first are 1/N.
Pick another one.  The odds of it being the same as the first are 1/N, and
the odds of it being the same as the 2nd are 1/N.  So the odds of it
matching any of the prior picks are 2/N.
Pick another one.  The odds of it being the same as any previous pick are
3/N.

If you repeatedly draw M items out of the pool (plus the first draw),
returning them each time, then the odds of any draw matching any other draw
are:
P = 1/N + 2/N +3/N + ... + M/N
P = ( sum(1 to M) ) / N

Note:  If you google for "sum positive integers," you'll find sum(1 to N) =
N * (N+1) / 2

P = M * (M+1) / 2N

In the context of hash collisions in a zpool, M would be the number of data
blocks in your zpool, and N would be all the possible hashes.  A sha-256
hash has 256 bits, so N = 2^256

I described an excessively large worst-case zpool in my other email, which
had 2^35 data blocks in it.  So...
M = 2^35

So the probability of any block hash colliding with any other hash in that
case is
2^35 * (2^35+1) / (2*2^256)
= ( 2^70 + 2^35 ) * 2^-257
= 2^-187 + 2^-222
~= 5.1E-57

There are estimated 8.87 E 49 atoms in planet Earth.  (
http://pages.prodigy.net/jhonig/bignum/qaearth.html )

The probability of a collision in your worst-case unrealistic dataset as
described, is even 100 million times less likely than randomly finding a
single specific atom in the whole planet Earth by pure luck.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey

> From: Pawel Jakub Dawidek [mailto:p...@freebsd.org]
> 
> Well, I find it quite reasonable. If your block is referenced 100 times,
> it is probably quite important. 

If your block is referenced 1 time, it is probably quite important.  Hence
redundancy in the pool.


> There are many corruption possibilities
> that can destroy your data. Imagine memory error, which corrupts
> io_offset in write zio structure and corrupted io_offset points at your
> deduped block referenced 100 times. It will be overwritten and
> redundancy won't help you. 

All of the corruption scenarios which allow you to fail despite pool
redundancy, also allow you to fail despite copies+N.


> Note, that deduped data is not alone
> here. Pool-wide metadata are stored 'copies+2' times (but no more than
> three) and dataset-wide metadata are stored 'copies+1' times (but no
> more than three), so by default pool metadata have three copies and
> dataset metadata have two copies, AFAIR. When you lose root node of a
> tree, you lose all your data, are you really, really sure only one copy
> is enough?

Interesting.  But no.  There is not only one copy as long as you have pool
redundancy.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Edward Ned Harvey

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Peter Taps
> 
> I haven't looked at the link that talks about the probability of
collision.
> Intuitively, I still wonder how the chances of collision can be so low. We
are
> reducing a 4K block to just 256 bits. If the chances of collision are so
low,
> *theoretically* it is  possible to reconstruct the original block from the
256-bit
> signature by using a simple lookup. Essentially, we would now have world's
> best compression algorithm irrespective of whether the data is text or
> binary. This is hard to digest.

BTW, at work we do a lot of theoretical mathematics, and one day a few
months ago, I was given the challenge to explore the concept of using a
hashing algorithm as a form of compression, exactly as you said.  The
conclusion was:  You can't reverse-hash in order to reconstruct unknown
original data, but you can do it (theoretically) if you have enough
additional information about what constitutes valid original data.  If you
have a huge lookup table of all the possible original data blocks, then the
hash can only be used to identify 2^(N-M) of them as possible candidates,
and some additional technique is necessary to figure out precisely which one
of those is the original data block.  (N is the length of the data block in
bits, and M is the length of the hash, in bits.)

Hashing discards some of the original data.  In fact, random data is
generally uncompressible, so if you try to compress random data and end up
with something smaller than the original, you can rest assured you're not
able to reconstruct.  However, if you know something about the original...
For example if you know the original is a valid text document written in
English, then in all likelihood there is only one possible original block
fitting that description and yielding the end hash result.  Even if there is
more than one original block which looks like valid English text and
produces the same end hash, it is easy to choose which one is correct based
on context...  Since you presumably know the previous block and the
subsequent block, you just choose the intermediate block which seamlessly
continues to produce valid English grammar at the junctions with adjacent
blocks.  This technique can be applied to most types of clearly structured
original data, but it cannot be applied to unstructured or unknown original
data.  So at best, hashing could be a special-case form of compression.

To decompress would require near-infinite compute hours or a large lookup
table to scan all the possible sets of inputs to find one which produces the
end hash...  So besides the fact that hashing is at best a specific form of
compression requiring additional auxiliary information, it's also
impractical.  To get this down to something reasonable, I considered using a
48MB lookup table for a 24-bit block of data (that's 2^24 entries of 24 bits
each), or a 16GB lookup table for a 32-bit block of data (2^32 entries of 32
bits each).  Well, in order to get a compression ratio worth talking about,
the hash size would have to be 3 bits or smaller.  That's a pretty big
lookup table to decompress 3 bits into 24 or 32...  And let's face it ...
9:1 compression isn't stellar for a text document.

And the final nail in the coffin was:  In order for this technique to be
viable, as mentioned, the original data must be structured.  For any set of
structured original data, all the information which is necessary for the
reverse-hash to identify valid data from the lookup table, could have been
used instead to create a specialized compression algorithm which is equal or
better than the reverse-hash.  

So reverse-hash decompression is actually the worst case algorithm for all
the data types which it's capable of working on.

But yes, you're right, it's theoretically possible for specific cases, but
not theoretically possible for the general case.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread David Magda

On Mon, January 10, 2011 02:41, Eric D. Mudama wrote:
> On Sun, Jan  9 at 22:54, Peter Taps wrote:
>> Thank you all for your help. I am the OP.
>>
>> I haven't looked at the link that talks about the probability of
>> collision. Intuitively, I still wonder how the chances of collision
>> can be so low. We are reducing a 4K block to just 256 bits. If the
>> chances of collision are so low, *theoretically* it is possible to
>> reconstruct the original block from the 256-bit signature by using a
>> simple lookup. Essentially, we would now have world's best
>> compression algorithm irrespective of whether the data is text or
>> binary. This is hard to digest.
>
> "simple" lookup isn't so simple when there are 2^256 records to
> search, however, fundamentally your understanding of hashes is
> correct.
[...]

It should also be noted that ZFS itself can "only" address 2^128 bytes
(not even 4K 'records'), and supposedly to fill those 2^128 bytes it would
take as much energy as it would take to boil the Earth's oceans:

http://blogs.sun.com/bonwick/entry/128_bit_storage_are_you

So recording and looking up 2^256 records would be quite an
accomplishment. It's a lot of data.

If the OP wants to know why the chances are so low, he'll have to learn a
bit about hash functions (which is what SHA-256 is):

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Cryptographic_hash_function

Knowing exactly how the math (?) works is not necessary, but understanding
the principles would be useful if one wants to have a general picture as
to why SHA-256 doesn't need a verification step, and why it was chosen as
one of the ZFS (dedupe) checksum options.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] cannot iterate filesystems: I/O error

2011-01-10 Thread Piotr Tarnowski

Hi,

after node panic I have an issue with import one of my zpools:

# zpool import dmysqlb2
cannot iterate filesystems: I/O error

so I tried to list zfs filesystems:

# zfs list -r dmysqlb2
cannot iterate filesystems: I/O error
NAMEUSED  AVAIL  REFER  MOUNTPOINT
dmysqlb2   15.5G  43.0G18K  none
dmysq...@20101130  0  -18K  -

there should be either:
dmysqlb2/etc
dmysqlb2/var
and their snapshots.

# zpool status -xv
  pool: dmysqlb2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
dmysqlb2   ONLINE   0 0 5
  mirror   ONLINE   0 020
c4t600A0B80002ACF5A0D584AD5EEFFd0  ONLINE   0 020
c4t600A0B80002ACF32157F4ACDF206d0  ONLINE   0 020

errors: Permanent errors have been detected in the following files:

dmysqlb2/var:<0x0>

output from zdb:

# zdb dmysqlb2
version=15
name='dmysqlb2'
state=0
txg=1350126
pool_guid=258476501669044711
hostid=2207000451
hostname='wega'
vdev_tree
type='root'
id=0
guid=258476501669044711
children[0]
type='mirror'
id=0
guid=1727969291773901682
metaslab_array=14
metaslab_shift=29
ashift=9
asize=64411140096
is_log=0
children[0]
type='disk'
id=0
guid=3314848442807482804
path='/dev/dsk/c4t600A0B80002ACF5A0D584AD5EEFFd0s0'
devid='id1,s...@n600a0b80002acf5a0d584ad5eeff/a'

phys_path='/scsi_vhci/s...@g600a0b80002acf5a0d584ad5eeff:a'
whole_disk=1
DTL=35
children[1]
type='disk'
id=1
guid=15321902971336355296
path='/dev/dsk/c4t600A0B80002ACF32157F4ACDF206d0s0'
devid='id1,s...@n600a0b80002acf32157f4acdf206/a'

phys_path='/scsi_vhci/s...@g600a0b80002acf32157f4acdf206:a'
whole_disk=1
DTL=36
WARNING: can't open objset for dmysqlb2/var
Uberblock

magic = 00bab10c
version = 15
txg = 1350153
guid_sum = 2176453133877232877
timestamp = 1294658859 UTC = Mon Jan 10 12:27:39 2011

Dataset mos [META], ID 0, cr_txg 4, 10.7M, 92 objects

Metaslabs:
vdev offsetspacemap  free  
--   ---   ---   -
vdev 0   offset0   spacemap 17   free 237M
vdev 0   offset 2000   spacemap 19   free 139M
vdev 0   offset 4000   spacemap 23   free87.8M
vdev 0   offset 6000   spacemap 32   free 264M
vdev 0   offset 8000   spacemap 34   free 262M
vdev 0   offset a000   spacemap 37   free 258M
vdev 0   offset c000   spacemap 40   free 273M
vdev 0   offset e000   spacemap 64   free 238M
vdev 0   offset1   spacemap 65   free 274M
vdev 0   offset12000   spacemap 66   free55.1M
vdev 0   offset14000   spacemap 67   free 212K
vdev 0   offset16000   spacemap 69   free 155K
vdev 0   offset18000   spacemap 71   free 270M
vdev 0   offset1a000   spacemap 75   free91.1M
vdev 0   offset1c000   spacemap 77   free 158M
vdev 0   offset1e000   spacemap 80   free 251M
vdev 0   offset2   spacemap 82   free 260M
vdev 0   offset22000   spacemap 92   free 283M
vdev 0   offset24000   spacemap 93   free 279M
vdev 0   offset26000   spacemap 94   free50.4M
vdev 0   offset28000   spacemap 95   free 136M
vdev 0   offset2a000   spacemap  0   free 512M
vdev 0   offset2c000   spacemap  0   free 512M
vdev 0   offset2e000   spacemap 16   free59.3M
vdev 0   offset3

Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors

2011-01-10 Thread Benji

Actually, it is not my blog ;)

To answer your question: you first need to create a new vdev that is 4K aligned 
unfortunately. I am not aware of any other means to accomplish what you seek.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Pawel Jakub Dawidek

On Sat, Jan 08, 2011 at 12:59:17PM -0500, Edward Ned Harvey wrote:
> Has anybody measured the cost of enabling or disabling verification?

Of course there is no easy answer:)

Let me explain how verification works exactly first.

You try to write a block. You see that block is already in dedup table
(it is already referenced). You read the block (maybe it is in ARC or in
L2ARC). You compare read block with what you want to write.

Based on the above:
1. If you have dedup on, but your blocks are not deduplicable at all,
   you will pay no price for verification, as there will be no need to
   compare anything.
2. If your data is highly deduplicable you will verify often. Now it
   depends if the data you need to read fits into your ARC/L2ARC or not.
   If it can be found in ARC, the impact will be small.
   If your pool is very large and you can't count on ARC help, each
   write will be turned into a read.

Also note an interesting property of dedup: if your data is highly
deduplicable you can actually improve performance by avoiding data
writes (and just increasing reference count).
Let me show you three degenerated tests to compare options.
I'm writing 64GB of zeros to a pool with dedup turned off, with dedup turned on
and with dedup+verification turned on (I use SHA256 checksum everywhere):

# zpool create -O checksum=sha256 tank ada{0,1,2,3}
# time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
254,11 real 0,07 user40,80 sys

# zpool create -O checksum=sha256 -O dedup=on tank ada{0,1,2,3}
# time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
154,60 real 0,05 user37,10 sys

# zpool create -O checksum=sha256 -O dedup=sha256,verify tank 
ada{0,1,2,3}
# time sh -c 'dd if=/dev/zero of=/tank/zero bs=1m count=65536; sync; 
zpool export tank'
173,43 real 0,02 user38,41 sys

As you can see in second and third test the data is of course in ARC, so the
difference here is only because of data comparison (no extra reads are needed)
and verification is 12% slower.

This is of course silly test, but as you can see dedup (even with verification)
is much faster than nodedup case, but this data is highly deduplicable:)

# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP   HEALTH  ALTROOT
tank   149G  8,58M   149G 0%  524288.00x  ONLINE  -

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3iTC1h5dwE.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Robert Milkowski


 On 01/ 8/11 05:59 PM, Edward Ned Harvey wrote:


Has anybody measured the cost of enabling or disabling verification?

The cost of disabling verification is an infinitesimally small number
multiplied by possibly all your data.  Basically lim->0 times lim->infinity.
This can only be evaluated on a case-by-case basis and there's no use in
making any more generalizations in favor or against it.

The benefit of disabling verification would presumably be faster
performance.  Has anybody got any measurements, or even calculations or
vague estimates or clueless guesses, to indicate how significant this is?
How much is there to gain by disabling verification?



Exactly my point and there isn't one answer which fits all environments.
In the testing I'm doing so far enabling/disabling verification doesn't 
make any noticeable difference so I'm sticking to verify. But I have 
enough memory and such a workload that I see little physical reads going 
on.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-10 Thread Pawel Jakub Dawidek

On Sun, Jan 09, 2011 at 07:27:52PM -0500, Edward Ned Harvey wrote:
> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Pawel Jakub Dawidek
> > 
> > Dedupditto doesn't work exactly that way. You can have at most 3 copies
> > of your block. Dedupditto minimal value is 100. The first copy is
> > created on first write, the second copy is created on dedupditto
> > references and the third copy is created on 'dedupditto * dedupditto'
> > references. So once you reach 1 references of your block ZFS will
> > create three physical copies, not earlier and never more than three.
> 
> What is the point of dedupditto?  If there is a block on disk, especially on
> a pool with redundancy so it can safely be assumed good now and for the
> future...   Why store the multiples?  Even if it is a maximum of 3, I
> presently only see the sense in a maximum of 1.

Well, I find it quite reasonable. If your block is referenced 100 times,
it is probably quite important. There are many corruption possibilities
that can destroy your data. Imagine memory error, which corrupts
io_offset in write zio structure and corrupted io_offset points at your
deduped block referenced 100 times. It will be overwritten and
redundancy won't help you. You will be able to detect corruption on
read, but it will be too late. Note, that deduped data is not alone
here. Pool-wide metadata are stored 'copies+2' times (but no more than
three) and dataset-wide metadata are stored 'copies+1' times (but no
more than three), so by default pool metadata have three copies and
dataset metadata have two copies, AFAIR. When you lose root node of a
tree, you lose all your data, are you really, really sure only one copy
is enough?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!

pgp8xUrafPnRn.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] Size of incremental stream

Re: [zfs-discuss] Size of incremental stream

[zfs-discuss] Size of incremental stream

Re: [zfs-discuss] pool metadata corrupted - any options?

Re: [zfs-discuss] ZFS root backup/"disaster" recovery, and moving root pool

[zfs-discuss] ZFS root backup/"disaster" recovery, and moving root pool

Re: [zfs-discuss] pool metadata corrupted - any options?

Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e

Re: [zfs-discuss] ZFS on emcpower0a and labels

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

[zfs-discuss] cannot iterate filesystems: I/O error

Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

19 matches

Site Navigation

Mail list logo

Footer information