Re: [zfs-discuss] zfs hangs with B141 when filebench runs

2010-07-15 Thread zhihui Chen
 zap_increment_int+0x68(ff05028c8940,
, 0, fffeffef7e00, ff0511d9bc80)
  ff002035c9f0 do_userquota_update+0x69(ff05028c8940,
100108000, 3, 0, 0, 1, ff0511d9bc80)
  ff002035ca50
dmu_objset_do_userquota_updates+0xde(ff05028c8940,
ff0511d9bc80)
  ff002035cad0 dsl_pool_sync+0x112(ff0502ceac00, f34)
  ff002035cb80 spa_sync+0x37b(ff0501269580, f34)
  ff002035cc20 txg_sync_thread+0x247(ff0502ceac00)
  ff002035cc30 thread_start+8()
 ff05123ce048::zio -r
ADDRESS  TYPE  STAGEWAITER
ff05123ce048 NULL  CHECKSUM_VERIFY  ff002035cc40
 ff051a9a9338READ  VDEV_IO_START-
  ff050e3a4050   READ  VDEV_IO_DONE -
   ff0519173c90  READ  VDEV_IO_START-

ff0519173c90::print zio_t io_done
io_done = vdev_cache_fill

The zio ff0519173c90 is vdec cach read rquest and can not be done
so that txt_sync_thread isblocked. I dont know why this zio can not be
satisfied and enter into done stage. I have tried to dd the raw device
which consists the pool when this zfs hangs, it works ok.

Thanks
Zhihui

On Mon, Jul 5, 2010 at 7:56 PM, zhihui Chen zhch...@gmail.com wrote:
 I tried to run zfs list on my system, but looks that this command
 will hangs. This command can not return even if I press contrl+c as
 following:
 r...@intel7:/export/bench/io/filebench/results# zfs list
 ^C^C^C^C

 ^C^C^C^C




 ..
 When this happens, I am running filebench benchmark with oltp
 workload. But zpool status shows that all pools are in good statu
 like following:
 r...@intel7:~# zpool status
  pool: rpool
  state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
  scan: none requested
 config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c8t0d0s0  ONLINE       0     0     0

 errors: No known data errors

  pool: tpool
  state: ONLINE
  scan: none requested
 config:

        NAME        STATE     READ WRITE CKSUM
        tpool       ONLINE       0     0     0
          c10t1d0   ONLINE       0     0     0

 errors: No known data errors


 My system is running B141 and tpool is using the latest version 26.
 Tried command truss -p `pgrep zfs`, but  it failes like following:

 r...@intel7:~# truss -p `pgrep zfs`
 truss: unanticipated system error: 5060

 Looks that zfs is in deadlock state, but I dont know what is the
 cause. I have tried to run filebench/oltp workload several times, each
 time it will leads to this state. But if I run filebench with other
 workload such as fileserver, webwerver, this issue does not happen.

 Thanks
 Zhihui

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs hangs with B141 when filebench runs

2010-07-15 Thread zhihui Chen
Thanks, I have filed the bug but I dont know how to provide the crash
dump. If this bug is accepted, RE can get the crash dump file from me.

On Thu, Jul 15, 2010 at 10:33 PM, George Wilson
george.r.wil...@oracle.com wrote:
 I don't recall seeing this issue before. Best thing to do is file a bug and
 include a pointer to the crash dump.

 - George

 zhihui Chen wrote:

 Looks that the txg_sync_thread for this pool has been blocked and
 never return, which leads to many other threads have been
 blocked. I have tried to change zfs_vdev_max_pending value from 10 to
 35 and retested the workload serveral times, this issue
 does not happen. But if I change it back to 10, it happens very
 easily. Any known bug on this or any suggestion to solve this issue?

 ff0502c3378c::wchaninfo -v

 ADDR             TYPE NWAITERS   THREAD           PROC
 ff0502c3378c cond     1730:  ff051cc6b500 go_filebench
                                 ff051ce61020 go_filebench
                                 ff051cc4e4e0 go_filebench
                                 ff051d115120 go_filebench
                                 ff051e9ed000 go_filebench
                                 ff051bf644c0 go_filebench
                                 ff051c65b000 go_filebench
                                 ff051c728500 go_filebench
                                 ff050d83a8c0 go_filebench
                                 ff051c528c00 go_filebench
                                 ff051b750800 go_filebench
                                 ff051cdd7520 go_filebench
                                 ff051ce71bc0 go_filebench
                                 ff051cb5e840 go_filebench
                                 ff051cbdec60 go_filebench
                                 ff0516473c60 go_filebench
                                 ff051d132820 go_filebench
                                 ff051d13a400 go_filebench
                                 ff050fbf0b40 go_filebench
                                 ff051ce7a400 go_filebench
                                 ff051b781820 go_filebench
                                 ff051ce603e0 go_filebench
                                 ff051d1bf840 go_filebench
                                 ff051c6c24c0 go_filebench
                                 ff051d204100 go_filebench
                                 ff051cbdf160 go_filebench
                                 ff051ce52c00 go_filebench
                                 ...

 ff051cc6b500::findstack -v

 stack pointer for thread ff051cc6b500: ff0020a76ac0
 [ ff0020a76ac0 _resume_from_idle+0xf1() ]
  ff0020a76af0 swtch+0x145()
  ff0020a76b20 cv_wait+0x61(ff0502c3378c, ff0502c33700)
  ff0020a76b70 zil_commit+0x67(ff0502c33700, 6b255, 14)
  ff0020a76d80 zfs_write+0xaaf(ff050b5c9140, ff0020a76e40,
 40, ff0502dab258, 0)
  ff0020a76df0 fop_write+0x6b(ff050b5c9140, ff0020a76e40,
 40, ff0502dab258, 0)
  ff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0)
  ff0020a76f10 sys_syscall32+0xff()

 From the zil_commit code, I try to find the thread whose stack have
 function call zil_commit_writer. This thread did not
 return back from zil_commit_write so that it will not call
 cv_broadcast to wake up the waiting threads.

 ff051d10fba0::findstack -v

 stack pointer for thread ff051d10fba0: ff0021ab9a10
 [ ff0021ab9a10 _resume_from_idle+0xf1() ]
  ff0021ab9a40 swtch+0x145()
  ff0021ab9a70 cv_wait+0x61(ff051ae1b988, ff051ae1b980)
  ff0021ab9ab0 zio_wait+0x5d(ff051ae1b680)
  ff0021ab9b20 zil_commit_writer+0x249(ff0502c33700, 6b250, e)
  ff0021ab9b70 zil_commit+0x91(ff0502c33700, 6b250, e)
  ff0021ab9d80 zfs_write+0xaaf(ff050b5c9540, ff0021ab9e40,
 40, ff0502dab258, 0)
  ff0021ab9df0 fop_write+0x6b(ff050b5c9540, ff0021ab9e40,
 40, ff0502dab258, 0)
  ff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0)
  ff0021ab9f10 sys_syscall32+0xff()

 ff051ae1b680::zio -r

 ADDRESS                                  TYPE  STAGE            WAITER
 ff051ae1b680                         NULL  CHECKSUM_VERIFY
  ff051d10fba0
  ff051a9c1978                        WRITE VDEV_IO_START    -
  ff052454d348                       WRITE VDEV_IO_START    -
  ff051572b960                        WRITE VDEV_IO_START    -
  ff050accb330                       WRITE VDEV_IO_START    -
  ff0514453c80                        WRITE VDEV_IO_START    -
  ff0524537648                       WRITE VDEV_IO_START    -
  ff05090e9660                        WRITE VDEV_IO_START    -
  ff05151cb698                       WRITE VDEV_IO_START    -
  ff0514668658                        WRITE VDEV_IO_START    -
  ff0514835690                       WRITE

[zfs-discuss] zfs hangs with B141 when filebench runs

2010-07-05 Thread zhihui Chen
I tried to run zfs list on my system, but looks that this command
will hangs. This command can not return even if I press contrl+c as
following:
r...@intel7:/export/bench/io/filebench/results# zfs list
^C^C^C^C

^C^C^C^C




..
When this happens, I am running filebench benchmark with oltp
workload. But zpool status shows that all pools are in good statu
like following:
r...@intel7:~# zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c8t0d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: tpool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
tpool   ONLINE   0 0 0
  c10t1d0   ONLINE   0 0 0

errors: No known data errors


My system is running B141 and tpool is using the latest version 26.
Tried command truss -p `pgrep zfs`, but  it failes like following:

r...@intel7:~# truss -p `pgrep zfs`
truss: unanticipated system error: 5060

Looks that zfs is in deadlock state, but I dont know what is the
cause. I have tried to run filebench/oltp workload several times, each
time it will leads to this state. But if I run filebench with other
workload such as fileserver, webwerver, this issue does not happen.

Thanks
Zhihui
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to convert zio-io_offset to disk block number?

2009-06-26 Thread zhihui Chen
Thanks, fixes following two issues, I can get the right value:(1) Dividing
offset 0x657800(6649856) by 512 and take it as the iseek value.
(2) Run the dd command on device c2t0d0s0, not c2t0d0.

Zhihui

2009/6/26 m...@bruningsystems.com m...@bruningsystems.com

 Hi Zhihui Chen,

 zhihui Chen wrote:

 Find that zio-io_offset is the absolute offset of device, not in sector
 unit. And If we need use zdb -R to dump the block, we should use the offset
 (zio-io_offset-0x40).

 2009/6/25 zhihui Chen zhch...@gmail.com mailto:zhch...@gmail.com


I use following dtrace script to trace the postion of one file on zfs:
#!/usr/sbin/dtrace -qs
zio_done:entry
/((zio_t *)(arg0))-io_vd/
{
zio=(zio_t *)arg0;
printf(Offset:%x and Size:%x\n,zio-io_offset,zio-io_size);
printf(vd:%x\n,(unsigned long)(zio-io_vd));
printf(process name:%s\n,execname);
tracemem(zio-io_data,40);
stack();
}
and I run dd command:  dd if=/export/dsk1/test1 bs=512 count=1,
the dtrace script will generate following it output:

Offset:657800 and Size:200
vd:ff02d6a1a700
process name:sched
  
  zfs`zio_execute+0xa0
  genunix`taskq_thread+0x193
  unix`thread_start+0x8
^C
The tracemem output is the right context of file test1, which is a
512-byte text file. zpool status has following output:
pool: tpool
state: ONLINE
scrub: none requested
config:
NAMESTATE READ WRITE CKSUM
tpool   ONLINE   0 0 0
  c2t0d0ONLINE   0 0 0
errors: No known data errors
My question is how to translate zio-io_offset (0x657800, equal to
decimal number 6649856) outputed by dtace to block number on disk
c2t0d0?
I tried to use dd if=/dev/dsk/c2t0d0 of=text iseek=6650112 bs=512
count=1 for a check,but the result is not right.

  I assume that 6650112 is offset plus 0x40?  Try dividing 6650112 by
 512 and using that
 as the iseek value.
 max

 Thanks
Zhihui


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to convert zio-io_offset to disk block number?

2009-06-25 Thread zhihui Chen
I use following dtrace script to trace the postion of one file on zfs:

#!/usr/sbin/dtrace -qs
zio_done:entry
/((zio_t *)(arg0))-io_vd/
{
zio=(zio_t *)arg0;
printf(Offset:%x and Size:%x\n,zio-io_offset,zio-io_size);
printf(vd:%x\n,(unsigned long)(zio-io_vd));
printf(process name:%s\n,execname);
tracemem(zio-io_data,40);
stack();
}

and I run dd command:  dd if=/export/dsk1/test1 bs=512 count=1, the dtrace
script will generate following it output:

Offset:657800 and Size:200
vd:ff02d6a1a700
process name:sched
  
  zfs`zio_execute+0xa0
  genunix`taskq_thread+0x193
  unix`thread_start+0x8
^C

The tracemem output is the right context of file test1, which is a 512-byte
text file. zpool status has following output:

pool: tpool
state: ONLINE
scrub: none requested
config:
NAMESTATE READ WRITE CKSUM
tpool   ONLINE   0 0 0
  c2t0d0ONLINE   0 0 0
errors: No known data errors

My question is how to translate zio-io_offset (0x657800, equal to decimal
number 6649856) outputed by dtace to block number on disk c2t0d0?
I tried to use dd if=/dev/dsk/c2t0d0 of=text iseek=6650112 bs=512 count=1
for a check,but the result is not right.

Thanks
Zhihui
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to convert zio-io_offset to disk block number?

2009-06-25 Thread zhihui Chen
Find that zio-io_offset is the absolute offset of device, not in sector
unit. And If we need use zdb -R to dump the block, we should use the offset
(zio-io_offset-0x40).
2009/6/25 zhihui Chen zhch...@gmail.com

 I use following dtrace script to trace the postion of one file on zfs:

 #!/usr/sbin/dtrace -qs
 zio_done:entry
 /((zio_t *)(arg0))-io_vd/
 {
 zio=(zio_t *)arg0;
 printf(Offset:%x and Size:%x\n,zio-io_offset,zio-io_size);
 printf(vd:%x\n,(unsigned long)(zio-io_vd));
 printf(process name:%s\n,execname);
 tracemem(zio-io_data,40);
 stack();
 }

 and I run dd command:  dd if=/export/dsk1/test1 bs=512 count=1, the
 dtrace script will generate following it output:

 Offset:657800 and Size:200
 vd:ff02d6a1a700
 process name:sched
   
   zfs`zio_execute+0xa0
   genunix`taskq_thread+0x193
   unix`thread_start+0x8
 ^C

 The tracemem output is the right context of file test1, which is a 512-byte
 text file. zpool status has following output:

 pool: tpool
 state: ONLINE
 scrub: none requested
 config:
 NAMESTATE READ WRITE CKSUM
 tpool   ONLINE   0 0 0
   c2t0d0ONLINE   0 0 0
 errors: No known data errors

 My question is how to translate zio-io_offset (0x657800, equal to decimal
 number 6649856) outputed by dtace to block number on disk c2t0d0?
 I tried to use dd if=/dev/dsk/c2t0d0 of=text iseek=6650112 bs=512 count=1
 for a check,but the result is not right.

 Thanks
 Zhihui


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] About ZFS compatibility

2009-05-20 Thread zhihui Chen
I have created a pool on external storage with B114. Then I export this pool
and import it on another system with B110.But this import will fail and show
error:  cannot import 'tpool': pool is formatted using a newer ZFS version.
Any big change in ZFS with B114 leads to this compatibility issue?

Thanks
Zhihui
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss