Some answers below... On Fri, Oct 30, 2009 at 12:55:03AM -0400, Dot Yet wrote: > Hi everyone, > > I need some help with regard to I/O performance on a xen based environment. > Following is the setup. > > Host Server Configuration: > > CPU: 2 x 5430 Xeon CPUs > Motherboard: Tempest i5400PW S5397 > RAM: 24GB RAM > Disks: 8 x 500 GB 7200 RPM disks - for VMs > SATA controller on 4 disks: Supermicro 8-Port SATA Card - (AOC-SAT2-MV8) > SATA controller on other 4 disks: onboard controller (Intel 6321ESB) > 1 x 250 GB 7200 RPM disk - for OS > OS: OpenSolaris 2009.06 with Xen 3.3.2 > > The relevant zfs pool 'vmdisk' has been laid out as follows: > > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > vmdisk 188G 1.18T 0 0 0 0 > mirror 62.7G 401G 0 0 0 0 > c8t2d0 - - 0 0 0 0 > c9t0d0 - - 0 0 0 0 > mirror 62.7G 401G 0 0 0 0 > c8t3d0 - - 0 0 0 0 > c9t1d0 - - 0 0 0 0 > mirror 62.7G 401G 0 0 0 0 > c8t4d0 - - 0 0 0 0 > c9t4d0 - - 0 0 0 0 > c8t5d0 0 464G 0 0 0 0 > cache - - - - - - > c9t5d0 76.6G 389G 0 0 0 0 > ---------- ----- ----- ----- ----- ----- ----- > > Where c8t5d0 is a slog device and c9t5d0 is a cache device. The recordsize > is default 128k and compression is set to OFF. > > Xen dom0 has been limited to 2 GB RAM through following configuration > in /rpool/boot/grub/menu.lst file: > > kernel$ /boot/$ISADIR/xen.gz dom0_mem=2048M > > domU has been configured as follows: > > VCPUs - 4 > RAM - 8GB > HDD - 100GB zvol on ZFS filesystem 'vmdisk'. The record size of the zvol is > default of 8k and compression is set to OFF. Also used --nonsparse command > line option while creating the VM through virt-install. > Paravirtualized > OS: CentOS release 5.4 (Final) X86_64 > Kernel: 2.6.18-164.el5xen > > > Now, the problem. I am running the following dd command inside the domU: > > [us...@db2db01 ~]$ dd if=/dev/zero of=j1 bs=8k count=100000 > 100000+0 records in > 100000+0 records out > 819200000 bytes (819 MB) copied, 2.57249 seconds, 318 MB/s > > > The above command returns under 2.5 seconds, however iostat on domU AND > zpool iostat on dom0, both continue to show write IO activity for upto 30 to > 40 seconds: > > domU iostat: > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 0.00 24.81 0.00 75.19 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > xvda 0.00 4989.00 0.00 1133.00 0.00 23.06 41.68 > 146.08 107.49 0.88 100.00 > xvda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > xvda2 0.00 4989.00 0.00 1133.00 0.00 23.06 41.68 > 146.08 107.49 0.88 100.00 > dm-0 0.00 0.00 0.00 6113.00 0.00 23.88 8.00 > 759.28 100.24 0.16 100.00 > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 0.00 24.94 0.00 75.06 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > xvda 0.00 4989.00 0.00 1153.00 0.00 23.91 42.47 > 146.32 146.37 0.87 100.40 > xvda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > xvda2 0.00 4989.00 0.00 1153.00 0.00 23.91 42.47 > 146.32 146.37 0.87 100.40 > dm-0 0.00 0.00 0.00 6143.00 0.00 24.00 8.00 > 751.75 143.43 0.16 100.40 > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > dom0 zpool iostat: > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > vmdisk 188G 1.18T 287 2.57K 2.24M 19.7M > mirror 62.7G 401G 142 890 1.12M 6.65M > c8t2d0 - - 66 352 530K 6.49M > c9t0d0 - - 76 302 612K 6.65M > mirror 62.7G 401G 83 856 670K 6.39M > c8t3d0 - - 43 307 345K 6.40M > c9t1d0 - - 40 293 325K 6.40M > mirror 62.7G 401G 60 886 485K 6.68M > c8t4d0 - - 50 373 402K 6.68M > c9t4d0 - - 10 307 82.9K 6.68M > c8t5d0 0 464G 0 0 0 0 > cache - - - - - - > c9t5d0 77.1G 389G 472 38 3.86M 3.50M > ---------- ----- ----- ----- ----- ----- ----- > > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > vmdisk 188G 1.18T 75 3.52K 594K 27.1M > mirror 62.7G 401G 30 1.16K 239K 8.89M > c8t2d0 - - 10 464 86.6K 8.89M > c9t0d0 - - 19 350 209K 8.89M > mirror 62.7G 401G 0 1.18K 0 9.10M > c8t3d0 - - 0 510 0 9.10M > c9t1d0 - - 0 385 0 9.10M > mirror 62.7G 401G 45 1.18K 355K 9.11M > c8t4d0 - - 37 469 354K 9.11M > c9t4d0 - - 7 391 57.7K 9.11M > c8t5d0 0 464G 0 0 0 0 > cache - - - - - - > c9t5d0 77.1G 389G 514 157 4.14M 17.4M > ---------- ----- ----- ----- ----- ----- ----- > > Can you tell me why this happens? Is this behavior coming from Linux or Xen > or ZFS? I do notice that iostat reports an iowait of 25%, but I don't know > who is causing this bottleneck amongst them. >
It's coming from ZFS (largely) in the dom0, however I can't comment on why it's an issue in the CentOS domU. When you write the data to the zvol most of the writes are cached to memory and the writes to disk are performed later when ZFS writes the memory cache to disk. It seems that your logging device is capable of writing about 40MB/s (see your figure below), so it takes at least 800/20 seconds to write this to disk. I'm not sure why the CentOS filesystem would continue to show activity, > Instead of writing a 800 meg file, if I write an 8gb file, the performance > is very poor (40 MB/s or so) and again, despite there is a long iowait after > the dd command returns. > Same explanation as above for the wait in dom0. The reason performance drops off is that you hit the ZFS cache limit and writes start to happen at the speed of the disk, rather than at memory speed. If you wish to confirm this explanation, then configure your dom0 with more memory (say 8GB, but any value >=8GB) and repeat your test for writing an 8GB file. You should find that this mostly writes to memory and should be (proportionately) as fast as your test when writing your 800Mb file with a 2GB dom0... Expect it to take at least 8192/40 (204) seconds to write the data to disk after the test has completed. Gary > Any help would be really appreciated. > > Kind regards, > dot.yet > _______________________________________________ > xen-discuss mailing list > [email protected] -- Gary Pennington Solaris Core OS Sun Microsystems [email protected] _______________________________________________ xen-discuss mailing list [email protected]
