If you're chasing CPU utilization, specifically %sys (time in the kernel), I would start with a time-based kernel profile.
#dtrace -n 'profile-997hz /arg0/ { @[stack()] = count(); } tick-60sec { trunc(@, 20); printa(@0; }' I would be curious to see where the CPU cycles are being consumed first, before going down the lock path… This assume that most or all of CPU utilization is %sys. If it's %usr, we take a different approach. Thanks /jim On Mar 25, 2012, at 1:29 AM, Aubrey Li wrote: > Hi, > > I'm migrating a webserver(apache+php) from RHEL to solaris. During the > stress testing comparison, I found under the same session number of client > request, CPU% is ~70% on RHEL while CPU% is full on solaris. > > After some investigation, zfs root lock emerges as a major doubtful point. > > Firstly, there are a bunch of mutex events. > ============================================ > # lockstat -D 10 sleep 2 > > Adaptive mutex spin: 1862701 events in 3.678 seconds (506499 events/sec) > > Count indv cuml rcnt nsec Lock Caller > ------------------------------------------------------------------------------- > 829064 45% 45% 0.00 33280 0xffffff117624a5d0 rrw_enter_read+0x1b > 705001 38% 82% 0.00 30983 0xffffff117624a5d0 rrw_exit+0x1d > 140678 8% 90% 0.00 10546 0xffffff117624a6e0 zfs_zget+0x46 > 37208 2% 92% 0.00 5403 0xffffff114b136840 vn_rele+0x1e > 33926 2% 94% 0.00 5417 0xffffff114b136840 lookuppnatcred+0xc5 > 27188 1% 95% 0.00 1155 vn_vfslocks_buckets+0xd980 > vn_vfslocks_getlock+0x3b > 11073 1% 96% 0.00 1639 vn_vfslocks_buckets+0x4600 > vn_vfslocks_getlock+0x3b > 9321 1% 96% 0.00 1961 0xffffff114b82a680 dnlc_lookup+0x83 > 6929 0% 97% 0.00 1590 0xffffff11573b8f28 > zfs_fastaccesschk_execute+0x6a > 5746 0% 97% 0.00 5935 0xffffff114b136840 lookuppnvp+0x566 > ------------------------------------------------------------------------------- > > If we look at the lock caller's stack, the following two are the hotspots. > ============================================================== > # lockstat -D 10 -l 0xffffff117624a5d0 -s 15 sleep 2 > > Adaptive mutex spin: 865609 events in 3.237 seconds (267444 events/sec) > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Lock Caller > 51215 6% 6% 0.00 100023 0xffffff117624a5d0 rrw_exit+0x1d > > nsec ------ Time Distribution ------ count Stack > 256 | 61 zfs_root+0x78 > 512 |@ 1746 fsop_root+0x2d > 1024 |@@@ 5210 traverse+0x65 > 2048 |@@@@@@@ 13464 kkkk+0x446 > 4096 |@@@@@@@ 13553 lookuppnatcred+0x119 > 8192 |@@@ 5355 lookupnameatcred+0x97 > 16384 |@ 2403 lookupnameat+0x6b > 32768 | 1331 cstatat_getvp+0x11e > 65536 | 673 cstatat64_32+0x5d > 131072 | 961 fstatat64_32+0x4c > 262144 | 1259 > _sys_sysenter_post_swapgs+0x149 > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Lock Caller > 48494 6% 17% 0.00 145263 0xffffff117624a5d0 rrw_enter_read+0x1b > > nsec ------ Time Distribution ------ count Stack > 256 | 17 rrw_enter+0x2c > 512 | 1120 zfs_root+0x3b > 1024 |@ 1718 fsop_root+0x2d > 2048 |@@ 4834 traverse+0x65 > 4096 |@@@@@@@@@@@ 18569 lookuppnvp+0x446 > 8192 |@@@@ 6620 lookuppnatcred+0x119 > 16384 |@ 2929 lookupnameatcred+0x97 > 32768 |@ 1635 lookupnameat+0x6b > 65536 | 894 cstatat_getvp+0x11e > 131072 | 1249 cstatat64_32+0x5d > 262144 |@ 1620 fstatat64_32+0x4c > 524288 |@ 2474 > _sys_sysenter_post_swapgs+0x149 > ============================================================ > > The apache root documentation is apparently in zfs dataset. It looks like each > file system operation will run into zfs root mutex contention. > > Is this an expected behavior? If so, is there any zfs tunable to > address this issue? > > Thanks, > -Aubrey > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss