Ok, now I know it's not related to the I/O performance, but to the ZFS itself.
At some time all 3 pools were locked in that way:
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w
trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 1
0 1 c8t0d0
0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0 100 0 0
0 0 c7t0d0
0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0 100 0 0
0 0 c7t1d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0
0 0 c7t2d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0
0 0 c7t3d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0
0 0 c7t4d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0
0 0 c7t5d0
0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0
0 0 c7t10d0
0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0
0 0 c7t11d0
^C
# zpool status
pool: data
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c7t4d0 ONLINE 0 0 0
c7t5d0 ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0
c7t1d0s0 ONLINE 0 0 0
errors: No known data errors
pool: tmp_data
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h1m, 0.74% done, 2h21m to go
config:
NAME STATE READ WRITE CKSUM
tmp_data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c7t11d0 ONLINE 0 0 0
c7t10d0 ONLINE 0 0 0 2.07G resilvered
errors: No known data errors
Resilvering tmp_data is not related. I did zpool attach manually.
On Tue, Sep 7, 2010 at 12:39 PM, Piotr Jasiukajtis <[email protected]> wrote:
> This is snv_128 x86.
>
>> ::arc
> hits = 39811943
> misses = 630634
> demand_data_hits = 29398113
> demand_data_misses = 490754
> demand_metadata_hits = 10413660
> demand_metadata_misses = 133461
> prefetch_data_hits = 0
> prefetch_data_misses = 0
> prefetch_metadata_hits = 170
> prefetch_metadata_misses = 6419
> mru_hits = 2933011
> mru_ghost_hits = 43202
> mfu_hits = 36878818
> mfu_ghost_hits = 45361
> deleted = 1299527
> recycle_miss = 46526
> mutex_miss = 355
> evict_skip = 25539
> evict_l2_cached = 0
> evict_l2_eligible = 77011188736
> evict_l2_ineligible = 76253184
> hash_elements = 278135
> hash_elements_max = 279843
> hash_collisions = 1653518
> hash_chains = 75135
> hash_chain_max = 9
> p = 4787 MB
> c = 5722 MB
> c_min = 715 MB
> c_max = 5722 MB
> size = 5428 MB
> hdr_size = 56535840
> data_size = 5158287360
> other_size = 477726560
> l2_hits = 0
> l2_misses = 0
> l2_feeds = 0
> l2_rw_clash = 0
> l2_read_bytes = 0
> l2_write_bytes = 0
> l2_writes_sent = 0
> l2_writes_done = 0
> l2_writes_error = 0
> l2_writes_hdr_miss = 0
> l2_evict_lock_retry = 0
> l2_evict_reading = 0
> l2_free_on_write = 0
> l2_abort_lowmem = 0
> l2_cksum_bad = 0
> l2_io_error = 0
> l2_size = 0
> l2_hdr_size = 0
> memory_throttle_count = 0
> arc_no_grow = 0
> arc_tempreserve = 0 MB
> arc_meta_used = 1288 MB
> arc_meta_limit = 1430 MB
> arc_meta_max = 1288 MB
>
>> ::memstat
> Page Summary Pages MB %Tot
> ------------ ---------------- ---------------- ----
> Kernel 789865 3085 19%
> ZFS File Data 1406055 5492 34%
> Anon 396297 1548 9%
> Exec and libs 7178 28 0%
> Page cache 8428 32 0%
> Free (cachelist) 117928 460 3%
> Free (freelist) 1464224 5719 35%
>
> Total 4189975 16367
> Physical 4189974 16367
>
>
>> ::spa -ev
> ADDR STATE NAME
> ffffff04f0eb4500 ACTIVE data
>
> ADDR STATE AUX DESCRIPTION
> ffffff04f2f52940 HEALTHY - root
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0 0 0 0 0
> BYTES 0 0 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fd980 HEALTHY - raidz
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x57090 0x37436a 0 0 0
> BYTES 0x8207f3c00 0x22345d0800 0 0 > 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fa0c0 HEALTHY - /dev/dsk/c7t2d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x4416e 0x105640 0 0 0x74326
> BYTES 0x10909da00 0x45089d600 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fa700 HEALTHY - /dev/dsk/c7t3d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x43fca 0x1055fa 0 0 0x74326
> BYTES 0x108e14400 0x45087a400 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fad40 HEALTHY - /dev/dsk/c7t4d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x44221 0x105533 0 0 0x74326
> BYTES 0x108a56c00 0x4508c8a00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fb380 HEALTHY - /dev/dsk/c7t5d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x44195 0x105528 0 0 0x74325
> BYTES 0x108b8c200 0x4508cfe00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fb9c0 HEALTHY - /dev/dsk/c7t6d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x441f3 0x10552c 0 0 0x74326
> BYTES 0x108e84800 0x4508c7a00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fc080 HEALTHY - /dev/dsk/c7t7d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x43f34 0x105529 0 0 0x74326
> BYTES 0x1080fc000 0x450869c00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fc6c0 HEALTHY - /dev/dsk/c7t8d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x43e8d 0x10559d 0 0 0x74326
> BYTES 0x10833d000 0x4508a9200 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fcd00 HEALTHY - /dev/dsk/c7t9d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x43aee 0x105671 0 0 0x74325
> BYTES 0x10714f000 0x45089b600 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050a2fd340 HEALTHY - /dev/dsk/c7t10d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x442cf 0x105693 0 0 0x74325
> BYTES 0x109338800 0x45086c200 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff04e6fcf080 ACTIVE rpool
>
> ffffff04e1c6dcc0 HEALTHY - root
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0 0 0 0 0
> BYTES 0 0 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff04e1c6d680 HEALTHY - mirror
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x3fc1d 0x169378 0 0 0
> BYTES 0x2c0409e00 0x214e47c00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff04e1c6d040 HEALTHY - /dev/dsk/c7t0d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x1cc01 0xe8749 0 0 0x4915
> BYTES 0x1cb5a6a00 0x215b96c00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff04e92c2980 HEALTHY - /dev/dsk/c7t1d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x1e3c5 0xe8556 0 0 0x4915
> BYTES 0x1cfa84c00 0x215b96c00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff04f0eb3a80 ACTIVE tmp_data
>
> ffffff050bb12d40 HEALTHY - root
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0 0 0 0 0
> BYTES 0 0 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
> ffffff050bb12700 HEALTHY - /dev/dsk/c7t11d0s0
>
> READ WRITE FREE CLAIM IOCTL
> OPS 0x2dce 0x2d3c1 0 0 0x15a
> BYTES 0x2b37c400 0x4dcc81e00 0 0 0
> EREAD 0
> EWRITE 0
> ECKSUM 0
>
>
>> ::walk zio_root
> ffffff05664b0328
> ffffff04eb660060
> ffffff04e96f9c88
> ffffff05207bd658
> ffffff05667fccb8
> ffffff05678449a0
> ffffff05678b6018
> ffffff0568aef640
> ffffff0566ece678
> ffffff050afa09a0
> ffffff055afef658
>
>> ::walk zio_root | ::zio -r
> ADDRESS TYPE STAGE WAITER
> ffffff05664b0328 NULL CHECKSUM_VERIFY
> ffffff051bb13b00
> ffffff05628fa680 WRITE VDEV_IO_START -
> ffffff0567d15370 WRITE VDEV_IO_START -
> ffffff0567409ce0 WRITE VDEV_IO_START -
> ffffff0566cbf968 WRITE VDEV_IO_START -
> ffffff056748cca8 WRITE VDEV_IO_START -
> ffffff055b184028 WRITE VDEV_IO_START -
> ffffff0567482328 WRITE VDEV_IO_START -
> ffffff0562f73658 WRITE VDEV_IO_START -
> ffffff04eb660060 NULL OPEN -
> ffffff04e96f9c88 NULL OPEN -
> ffffff05207bd658 NULL CHECKSUM_VERIFY
> ffffff001fe7fc60
> ffffff055bc67060 WRITE VDEV_IO_START -
> ffffff0568160048 WRITE VDEV_IO_START -
> ffffff05661fbca8 WRITE VDEV_IO_START -
> ffffff0566edacc0 WRITE VDEV_IO_START -
> ffffff05665d5018 WRITE VDEV_IO_START -
> ffffff05667c3008 WRITE VDEV_IO_START -
> ffffff05664b39c0 WRITE VDEV_IO_START -
> ffffff051cea6010 WRITE VDEV_IO_START -
> ffffff051d333370 WRITE VDEV_IO_START -
> ffffff0521255048 WRITE VDEV_IO_START -
>
> This is not all output.
>
>> ::walk zio_root | ::zio -r ! wc -l
> 7099
>
> I am hitting this issue on 2 machines, both 128.
> The system is not response (ping still works) so I bet there is some
> kind of deadlock within ZFS.
>
> Were there any known ZFS related bugs similar to this one within b128?
>
> On Mon, Sep 6, 2010 at 12:13 PM, Jason Banham <[email protected]> wrote:
>> On 06/09/2010 10:56, Piotr Jasiukajtis wrote:
>>>
>>> Hi,
>>>
>>> I am looking for the ideas on how to check if the machine was under
>>> high I/O pressure before it panicked (caused manually by an NMI).
>>> By I/O I mean disks and ZFS stack.
>>>
>>
>> Do you believe ZFS was a key component in the I/O pressure?
>> I've CC'd [email protected] on my reply.
>>
>> If you think there was a lot of I/O happening, you could run:
>>
>> ::walk zio_root | ::zio -r
>>
>> This should give you an idea of the amount of ZIO going through ZFS.
>> I would also be curious to look at the state of the pool(s) and the
>> ZFS memory usage:
>>
>> ::spa -ev
>> ::arc
>>
>>
>>
>>
>> Kind regards,
>>
>> Jason
>> _______________________________________________
>> mdb-discuss mailing list
>> [email protected]
>>
>
>
>
> --
> Piotr Jasiukajtis | estibi | SCA OS0072
> http://estseg.blogspot.com
>
--
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com
_______________________________________________
zfs-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-code