Re: [zfs-discuss] Zpool Import Hanging

2011-01-17 Thread Ian Collins

 On 01/18/11 05:22 PM, Repetski, Stephen wrote:


On Mon, Jan 17, 2011 at 22:08, Ian Collins > wrote:


 On 01/18/11 04:00 PM, Repetski, Stephen wrote:


Hi All,

I believe this has been asked before, but I wasn’t able to
find too much information about the subject. Long story short,
I was moving data around on a storage zpool of mine and a zfs
destroy  hung (or so I thought). This pool had
dedup turned on at times while imported as well; it’s running
on a Nexenta Core 3.0.1 box (snv_134f).


The first time the machine was rebooted, it hung at the
“Loading ZFS filesystems” line  after loading the kernel; I
booted the box with all drives unplugged and exported the
pool. The machine was rebooted, and now the pool is hanging on
import (zpool import –Fn Nalgene). I’m using
“0t2761::pid2proc|::walk thread|::findstack" | mdb –k” to try
and view what the import processes is doing, but I’m not a
hard-core ZFS/Solaris dev so I don’t know if I’m reading the
output correctly, but it appears that ZFS is continuing to
delete a snapshot/FS from before (reading from the top down):

What does "zpool iostat  10" show?

If you have a lot a deduped data and not a lot of RAM (or a cache
device), it can take a very long time to destroy a filesystem.
 You will see lot of reads and not many writes if this is happening.

-- 
Ian.



Zpool iostat itself hangs,


If you are running it as root, try another user.  I don't know about 
recent builds, but zpool commands are way slower as root on Solaris 10.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool Import Hanging

2011-01-17 Thread Repetski, Stephen
On Mon, Jan 17, 2011 at 22:08, Ian Collins  wrote:

>  On 01/18/11 04:00 PM, Repetski, Stephen wrote:
>
>>
>> Hi All,
>>
>> I believe this has been asked before, but I wasn’t able to find too much
>> information about the subject. Long story short, I was moving data around on
>> a storage zpool of mine and a zfs destroy  hung (or so I
>> thought). This pool had dedup turned on at times while imported as well;
>> it’s running on a Nexenta Core 3.0.1 box (snv_134f).
>>
>>
>  The first time the machine was rebooted, it hung at the “Loading ZFS
>> filesystems” line  after loading the kernel; I booted the box with all
>> drives unplugged and exported the pool. The machine was rebooted, and now
>> the pool is hanging on import (zpool import –Fn Nalgene). I’m using
>> “0t2761::pid2proc|::walk thread|::findstack" | mdb –k” to try and view what
>> the import processes is doing, but I’m not a hard-core ZFS/Solaris dev so I
>> don’t know if I’m reading the output correctly, but it appears that ZFS is
>> continuing to delete a snapshot/FS from before (reading from the top down):
>>
>>  What does "zpool iostat  10" show?
>
> If you have a lot a deduped data and not a lot of RAM (or a cache device),
> it can take a very long time to destroy a filesystem.  You will see lot of
> reads and not many writes if this is happening.
>
> --
> Ian.
>
>
Zpool iostat itself hangs, but iostat does show me one drive in particular
causing some issues - http://pastebin.com/6rJG3qV9 - %w and %b drop to ~50
and ~90, respectively, when mdk shows ZFS doing some deduplication work (
http://pastebin.com/EMPYy5Rr). As you said the pool is mostly reading data
and not writing much. I should be able to switch up that drive to another
controller (currently on a PCI SATA adapter) and see what iostat reports
then.

Until then, I'll keep the zpool import running and see what the box does...

Thanks,
Trey

--

*Stephen Repetski*
BS Applied Networking and Systems Administration, 2013
Rochester Institute of Technology, Thomas Jefferson HS S&T
skr3...@rit.edu | srepe...@srepetsk.net
http://srepetsk.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool Import Hanging

2011-01-17 Thread Ian Collins

 On 01/18/11 04:00 PM, Repetski, Stephen wrote:


Hi All,

I believe this has been asked before, but I wasn’t able to find too 
much information about the subject. Long story short, I was moving 
data around on a storage zpool of mine and a zfs destroy  
hung (or so I thought). This pool had dedup turned on at times while 
imported as well; it’s running on a Nexenta Core 3.0.1 box (snv_134f).




The first time the machine was rebooted, it hung at the “Loading ZFS 
filesystems” line  after loading the kernel; I booted the box with all 
drives unplugged and exported the pool. The machine was rebooted, and 
now the pool is hanging on import (zpool import –Fn Nalgene). I’m 
using “0t2761::pid2proc|::walk thread|::findstack" | mdb –k” to try 
and view what the import processes is doing, but I’m not a hard-core 
ZFS/Solaris dev so I don’t know if I’m reading the output correctly, 
but it appears that ZFS is continuing to delete a snapshot/FS from 
before (reading from the top down):



What does "zpool iostat  10" show?

If you have a lot a deduped data and not a lot of RAM (or a cache 
device), it can take a very long time to destroy a filesystem.  You will 
see lot of reads and not many writes if this is happening.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zpool Import Hanging

2011-01-17 Thread Repetski, Stephen
Hi All,



I believe this has been asked before, but I wasn’t able to find too much
information about the subject. Long story short, I was moving data around on
a storage zpool of mine and a zfs destroy  hung (or so I
thought). This pool had dedup turned on at times while imported as well;
it’s running on a Nexenta Core 3.0.1 box (snv_134f).



The first time the machine was rebooted, it hung at the “Loading ZFS
filesystems” line  after loading the kernel; I booted the box with all
drives unplugged and exported the pool. The machine was rebooted, and now
the pool is hanging on import (zpool import –Fn Nalgene). I’m using
“0t2761::pid2proc|::walk thread|::findstack" | mdb –k” to try and view what
the import processes is doing, but I’m not a hard-core ZFS/Solaris dev so I
don’t know if I’m reading the output correctly, but it appears that ZFS is
continuing to delete a snapshot/FS from before (reading from the top down):



stack pointer for thread ff01ce408e00: ff0008f2b1f0

[ ff0008f2b1f0 _resume_from_idle+0xf1() ]

  ff0008f2b220 swtch+0x145()

  ff0008f2b250 cv_wait+0x61()

  ff0008f2b2a0 txg_wait_open+0x7a()

  ff0008f2b2e0 dmu_tx_wait+0xb3()

  ff0008f2b320 dmu_tx_assign+0x4b()

  ff0008f2b3b0 dmu_free_long_range_impl+0x12b()

  ff0008f2b400 dmu_free_object+0xe6()

  ff0008f2b710 dsl_dataset_destroy+0x122()

  ff0008f2b740 dsl_destroy_inconsistent+0x5f()

  ff0008f2b770 findfunc+0x23()

  ff0008f2b850 dmu_objset_find_spa+0x38c()

  ff0008f2b930 dmu_objset_find_spa+0x153()

  ff0008f2b970 dmu_objset_find+0x40()

  ff0008f2ba40 spa_load_impl+0xb23()

  ff0008f2bad0 spa_load+0x117()

  ff0008f2bb50 spa_load_best+0x78()

  ff0008f2bbf0 spa_import+0xee()

  ff0008f2bc40 zfs_ioc_pool_import+0xc0()

  ff0008f2bcc0 zfsdev_ioctl+0x177()

  ff0008f2bd00 cdev_ioctl+0x45()

  ff0008f2bd40 spec_ioctl+0x5a()

  ff0008f2bdc0 fop_ioctl+0x7b()

  ff0008f2bec0 ioctl+0x18e()

  ff0008f2bf10 sys_syscall32+0xff()



I have this in a loop running every 15 secs, and I’ll occasionally see some
ddt_* lines as well (current dedup ratio is 1.05). The ratio was originally
about 1.09 when I started the import (from zdb –e Nalgene); is the system
doing something special, or is this just ZFS destroying the pending-deletion
data causing the ratio to change? As far as the import, is there any
estimate I can make as to how long the process will take? I’ve had it
running since Saturday morning (~36 hours now) through a couple system
lockups.



The zpool is a 7-disk raidz2 (5tb useable, 2tb used) with 4GB of RAM (8GB
coming tomorrow which I’ll put in to use) running on an AMD Phenom II X4
processor.



Thanks in advance!



--

*Stephen Repetski*

BS Applied Networking and Systems Administration, 2013

Rochester Institute of Technology, Thomas Jefferson HS S&T

skr3...@rit.edu | srepe...@srepetsk.net

http://srepetsk.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-17 Thread Nicolas Williams
On Sat, Jan 15, 2011 at 10:19:23AM -0600, Bob Friesenhahn wrote:
> On Fri, 14 Jan 2011, Peter Taps wrote:
> 
> >Thank you for sharing the calculations. In lay terms, for Sha256,
> >how many blocks of data would be needed to have one collision?
> 
> Two.

Pretty funny.

In this thread some of you are treating SHA-256 as an idealized hash
function.  The odds of accidentally finding collisions in an idealized
256-bit hash function are minute because the distribution of hash
function outputs over inputs is random (or, rather, pseudo-random).

But cryptographic hash functions are generally only approximations of
idealized hash functions.  There's nothing to say that there aren't
pathological corner cases where a given hash function produces lots of
collisions that would be semantically meaningful to people -- i.e., a
set of inputs over which the outputs are not randomly distributed.  Now,
of course, we don't know of such pathological corner cases for SHA-256,
but not that long ago we didn't know of any for SHA-1 or MD5 either.

The question of whether disabling verification would improve performance
is pretty simple: if you have highly deduplicatious, _synchronous_ (or
nearly so, due to frequent fsync()s or NFS close operations) writes, and
the "working set" did not fit in the ARC nor L2ARC, then yes, disabling
verification will help significantly, by removing an average of at least
half a disk rotation from the write latency.  Or if you have the same
work load but with asynchronous writes that might as well be synchronous
due to an undersized cache (relative to the workload).  Otherwise the
cost of verification should be hidden by caching.

Another way to put this would be that you should first determine that
verification is actually affecting performance, and only _then_ should
you consider disabling it.  But if you want to have the freedom to
disable verficiation, then you should be using SHA-256 (or switch to it
when disabling verification).

Safety features that cost nothing are not worth turning off,
so make sure their cost is significant before even thinking
of turning them off.

Similarly, the cost of SHA-256 vs. Fletcher should also be lost in the
noise if the system has enough CPU, but if the choice of hash function
could make the system CPU-bound instead of I/O-bound, then the choice of
hash function would make an impact on performance.  The choice of hash
functions will have a different performance impact than verification: a
slower hash function will affect non-deduplicatious workloads more than
highly deduplicatious workloads (since the latter will require more I/O
for verification, which will overwhelm the cost of the hash function).
Again, measure first.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss