Hi All, I've just built an 8 disk zfs storage box, and I'm in the testing phase before I put it into production. I've run into some unusual results, and I was hoping the community could offer some suggestions. I've bascially made the switch to Solaris on the promises of ZFS alone (yes I'm that excited about it!), so naturally I'm looking forward to some great performance - but it appears I'm going to need some help finding all of it.
I was having even lower numbers with filebench, so I decided to dial back to a really simple app for testing - bonnie. The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate sata II 300GB disks, Supermicro SAT2-MV8 8 port sata controller, running at/on a 133Mhz 64pci-x bus. The bottle neck here, by my thinkng, should be the disks themselves. It's not the disk interfaces ('300MB'), the disk bus (300MB EACH), the pci-x bus (1.1GB), and I'd hope a 64-bit 3Ghz cpu would be sufficent. Tests were run on a fresh clean zpool, on an idle system. Rogue results were dropped, and as you can see below, all tests were run more then once. 8GB should be far more then the 1GB of RAM that the system has, eliminating caching issues. If I've still managed to overlook something in my testing setup, please let me know - I sure did try! Sorry about the formatting - this is bound to end up ugly Bonnie -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- raid0 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8 disk 8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1 286.0 2.0 8 disk 8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9 302.9 2.1 so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be LOWER then writes?? -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- mirror MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8 disk 8196 33285 38.6 46033 9.9 33077 6.8 67934 90.4 93445 7.7 230.5 1.3 8 disk 8196 34821 41.4 46136 9.0 32445 6.6 67120 89.1 94403 6.9 210.4 1.8 46MB/sec writes, each disk individually can do better, but I guess keeping 8 disks in sync is hurting performance. The 94MB/sec writes is interesting. One the one hand, that's greater then 1 disk's worth, so I'm getting striping performance out of a mirror GO ZFS. On the other, if I can get striping performance from mirrored reads, why is it only 94MB/sec? Seemingly it's not cpu bound. Now for the important test, raid-z -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- raidz MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8 disk 8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6 131.3 1.0 8 disk 8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4 127.3 1.0 8 disk 8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7 124.5 0.9 7 disk 8196 51103 58.8 93815 19.1 74093 16.1 64705 86.5 331865 32.8 124.9 1.0 7 disk 8196 49446 56.8 93946 18.7 73092 15.8 64708 86.7 331458 32.7 127.1 1.0 7 disk 8196 49831 57.1 81305 16.2 78101 16.9 64698 86.4 331577 32.7 132.4 1.0 6 disk 8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1 132.7 0.9 6 disk 8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7 133.4 0.8 4 disk 8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9 134.1 0.9 I'm getting distinctly non-linear scaling here. Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8 =33Mb/sec with cpu to spare (roughly half on what each individual disk should be capable of). Here I'm getting 123/4= 30Mb/sec, or should that be 123/3= 41Mb/sec? Using 30 as a basline, I'd be expecting to see twice that with 8 disks (240ish?). What I end up with is ~135, Clearly not good scaling at all. The really interesting numbers happen at 7 disks - it's slower then with 4, in all tests. I ran it 3x to be sure. Note this was a native 7 disk raid-z, it wasn't 8 running in degraded mode with 7. Something is really wrong with my write performance here across the board. Reads: 4 disks gives me 190MB/sec. WOAH! I'm very happy with that. 8 disks should scale to 380 then, Well 320 isn't all that far off - no biggie. Looking at the 6 disk raidz is interesting though, 290MB/sec. The disks are good for 60+MB/sec individually. 290 is 48/disk - note also that this is better then my raid0 performance?! Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra performance? Something is going very wrong here too. The 7 disk raidz read test is about what I'd expect (330/7= 47/disk), but it shows that the 8 disk is actually going backwards. hmm... I understand that going for an 8 disk wide raidz isn't optimal in terms of redundancy and IOPS/sec - but my workload shouldn't involve large amounts of sustained random IO, so I'm happy to take the loss in favour of absolute capacity. My issue here is the scaling on sequential block transfers, not optimal design. All three raid levels have had unexpected results, and I'll really apprectiate some suggestions on how I can troubleshoot this. I know how to run iostat while bonnie is running, but that's about it. Incidentally, iostat is telling me that the disks are at best on hitting around 70% B. With the 8 disk tests, it was often below 50%.... Is my issue perhaps with the sata card that I'm using? Maybe it's just not able to handle that much throughput, despite being advertised to do so. With Raid0 (aka dynamic stripes), I know that each disk can read at 60-70Mb/sec. Why am I not getting 65*8 (500MB/sec+) performance. Maybe it's the marvell driver at fault here? My thinking is that I need to get raid0 performing as expected before looking at raidz, but I'm afraid I really don't know where to begin. All thoughts & suggestions welcome. I'm not using the disks yet, so I can blow the zpool away as needed. Many thanks, Jonathan Wheeler This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss