Ok, now I know it's not related to the I/O performance, but to the ZFS itself.
At some time all 3 pools were locked in that way: extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 1 0 1 c8t0d0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0 100 0 0 0 0 c7t0d0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0 100 0 0 0 0 c7t1d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c7t2d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c7t3d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c7t4d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c7t5d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c7t10d0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0 100 0 0 0 0 c7t11d0 ^C # zpool status pool: data state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t0d0s0 ONLINE 0 0 0 c7t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tmp_data state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h1m, 0.74% done, 2h21m to go config: NAME STATE READ WRITE CKSUM tmp_data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t11d0 ONLINE 0 0 0 c7t10d0 ONLINE 0 0 0 2.07G resilvered errors: No known data errors Resilvering tmp_data is not related. I did zpool attach manually. On Tue, Sep 7, 2010 at 12:39 PM, Piotr Jasiukajtis <est...@gmail.com> wrote: > This is snv_128 x86. > >> ::arc > hits = 39811943 > misses = 630634 > demand_data_hits = 29398113 > demand_data_misses = 490754 > demand_metadata_hits = 10413660 > demand_metadata_misses = 133461 > prefetch_data_hits = 0 > prefetch_data_misses = 0 > prefetch_metadata_hits = 170 > prefetch_metadata_misses = 6419 > mru_hits = 2933011 > mru_ghost_hits = 43202 > mfu_hits = 36878818 > mfu_ghost_hits = 45361 > deleted = 1299527 > recycle_miss = 46526 > mutex_miss = 355 > evict_skip = 25539 > evict_l2_cached = 0 > evict_l2_eligible = 77011188736 > evict_l2_ineligible = 76253184 > hash_elements = 278135 > hash_elements_max = 279843 > hash_collisions = 1653518 > hash_chains = 75135 > hash_chain_max = 9 > p = 4787 MB > c = 5722 MB > c_min = 715 MB > c_max = 5722 MB > size = 5428 MB > hdr_size = 56535840 > data_size = 5158287360 > other_size = 477726560 > l2_hits = 0 > l2_misses = 0 > l2_feeds = 0 > l2_rw_clash = 0 > l2_read_bytes = 0 > l2_write_bytes = 0 > l2_writes_sent = 0 > l2_writes_done = 0 > l2_writes_error = 0 > l2_writes_hdr_miss = 0 > l2_evict_lock_retry = 0 > l2_evict_reading = 0 > l2_free_on_write = 0 > l2_abort_lowmem = 0 > l2_cksum_bad = 0 > l2_io_error = 0 > l2_size = 0 > l2_hdr_size = 0 > memory_throttle_count = 0 > arc_no_grow = 0 > arc_tempreserve = 0 MB > arc_meta_used = 1288 MB > arc_meta_limit = 1430 MB > arc_meta_max = 1288 MB > >> ::memstat > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 789865 3085 19% > ZFS File Data 1406055 5492 34% > Anon 396297 1548 9% > Exec and libs 7178 28 0% > Page cache 8428 32 0% > Free (cachelist) 117928 460 3% > Free (freelist) 1464224 5719 35% > > Total 4189975 16367 > Physical 4189974 16367 > > >> ::spa -ev > ADDR STATE NAME > ffffff04f0eb4500 ACTIVE data > > ADDR STATE AUX DESCRIPTION > ffffff04f2f52940 HEALTHY - root > > READ WRITE FREE CLAIM IOCTL > OPS 0 0 0 0 0 > BYTES 0 0 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fd980 HEALTHY - raidz > > READ WRITE FREE CLAIM IOCTL > OPS 0x57090 0x37436a 0 0 0 > BYTES 0x8207f3c00 0x22345d0800 0 0 > 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fa0c0 HEALTHY - /dev/dsk/c7t2d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x4416e 0x105640 0 0 0x74326 > BYTES 0x10909da00 0x45089d600 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fa700 HEALTHY - /dev/dsk/c7t3d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x43fca 0x1055fa 0 0 0x74326 > BYTES 0x108e14400 0x45087a400 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fad40 HEALTHY - /dev/dsk/c7t4d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x44221 0x105533 0 0 0x74326 > BYTES 0x108a56c00 0x4508c8a00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fb380 HEALTHY - /dev/dsk/c7t5d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x44195 0x105528 0 0 0x74325 > BYTES 0x108b8c200 0x4508cfe00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fb9c0 HEALTHY - /dev/dsk/c7t6d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x441f3 0x10552c 0 0 0x74326 > BYTES 0x108e84800 0x4508c7a00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fc080 HEALTHY - /dev/dsk/c7t7d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x43f34 0x105529 0 0 0x74326 > BYTES 0x1080fc000 0x450869c00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fc6c0 HEALTHY - /dev/dsk/c7t8d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x43e8d 0x10559d 0 0 0x74326 > BYTES 0x10833d000 0x4508a9200 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fcd00 HEALTHY - /dev/dsk/c7t9d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x43aee 0x105671 0 0 0x74325 > BYTES 0x10714f000 0x45089b600 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050a2fd340 HEALTHY - /dev/dsk/c7t10d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x442cf 0x105693 0 0 0x74325 > BYTES 0x109338800 0x45086c200 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff04e6fcf080 ACTIVE rpool > > ffffff04e1c6dcc0 HEALTHY - root > > READ WRITE FREE CLAIM IOCTL > OPS 0 0 0 0 0 > BYTES 0 0 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff04e1c6d680 HEALTHY - mirror > > READ WRITE FREE CLAIM IOCTL > OPS 0x3fc1d 0x169378 0 0 0 > BYTES 0x2c0409e00 0x214e47c00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff04e1c6d040 HEALTHY - /dev/dsk/c7t0d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x1cc01 0xe8749 0 0 0x4915 > BYTES 0x1cb5a6a00 0x215b96c00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff04e92c2980 HEALTHY - /dev/dsk/c7t1d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x1e3c5 0xe8556 0 0 0x4915 > BYTES 0x1cfa84c00 0x215b96c00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff04f0eb3a80 ACTIVE tmp_data > > ffffff050bb12d40 HEALTHY - root > > READ WRITE FREE CLAIM IOCTL > OPS 0 0 0 0 0 > BYTES 0 0 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > ffffff050bb12700 HEALTHY - /dev/dsk/c7t11d0s0 > > READ WRITE FREE CLAIM IOCTL > OPS 0x2dce 0x2d3c1 0 0 0x15a > BYTES 0x2b37c400 0x4dcc81e00 0 0 0 > EREAD 0 > EWRITE 0 > ECKSUM 0 > > >> ::walk zio_root > ffffff05664b0328 > ffffff04eb660060 > ffffff04e96f9c88 > ffffff05207bd658 > ffffff05667fccb8 > ffffff05678449a0 > ffffff05678b6018 > ffffff0568aef640 > ffffff0566ece678 > ffffff050afa09a0 > ffffff055afef658 > >> ::walk zio_root | ::zio -r > ADDRESS TYPE STAGE WAITER > ffffff05664b0328 NULL CHECKSUM_VERIFY > ffffff051bb13b00 > ffffff05628fa680 WRITE VDEV_IO_START - > ffffff0567d15370 WRITE VDEV_IO_START - > ffffff0567409ce0 WRITE VDEV_IO_START - > ffffff0566cbf968 WRITE VDEV_IO_START - > ffffff056748cca8 WRITE VDEV_IO_START - > ffffff055b184028 WRITE VDEV_IO_START - > ffffff0567482328 WRITE VDEV_IO_START - > ffffff0562f73658 WRITE VDEV_IO_START - > ffffff04eb660060 NULL OPEN - > ffffff04e96f9c88 NULL OPEN - > ffffff05207bd658 NULL CHECKSUM_VERIFY > ffffff001fe7fc60 > ffffff055bc67060 WRITE VDEV_IO_START - > ffffff0568160048 WRITE VDEV_IO_START - > ffffff05661fbca8 WRITE VDEV_IO_START - > ffffff0566edacc0 WRITE VDEV_IO_START - > ffffff05665d5018 WRITE VDEV_IO_START - > ffffff05667c3008 WRITE VDEV_IO_START - > ffffff05664b39c0 WRITE VDEV_IO_START - > ffffff051cea6010 WRITE VDEV_IO_START - > ffffff051d333370 WRITE VDEV_IO_START - > ffffff0521255048 WRITE VDEV_IO_START - > > This is not all output. > >> ::walk zio_root | ::zio -r ! wc -l > 7099 > > I am hitting this issue on 2 machines, both 128. > The system is not response (ping still works) so I bet there is some > kind of deadlock within ZFS. > > Were there any known ZFS related bugs similar to this one within b128? > > On Mon, Sep 6, 2010 at 12:13 PM, Jason Banham <jason.ban...@oracle.com> wrote: >> On 06/09/2010 10:56, Piotr Jasiukajtis wrote: >>> >>> Hi, >>> >>> I am looking for the ideas on how to check if the machine was under >>> high I/O pressure before it panicked (caused manually by an NMI). >>> By I/O I mean disks and ZFS stack. >>> >> >> Do you believe ZFS was a key component in the I/O pressure? >> I've CC'd zfs-discuss@opensolaris.org on my reply. >> >> If you think there was a lot of I/O happening, you could run: >> >> ::walk zio_root | ::zio -r >> >> This should give you an idea of the amount of ZIO going through ZFS. >> I would also be curious to look at the state of the pool(s) and the >> ZFS memory usage: >> >> ::spa -ev >> ::arc >> >> >> >> >> Kind regards, >> >> Jason >> _______________________________________________ >> mdb-discuss mailing list >> mdb-disc...@opensolaris.org >> > > > > -- > Piotr Jasiukajtis | estibi | SCA OS0072 > http://estseg.blogspot.com > -- Piotr Jasiukajtis | estibi | SCA OS0072 http://estseg.blogspot.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss