Hi Bob ... as richard has mentioned, allocation to vdevs is done in a fixed sized chunk (richard specs 1MB, but I remember a 512KB number from the original spec, but this is not very important), and the allocation algorithm is basically doing load balancing.
for your non-raid pool, this chunk size will stay fixed regardless of the block size you choose when creating the file system or the IO unit size your applications(s) use. (The stripe size can dynamically change in a raidz pool, but not in your non-raid pool.) Measuring bandwidth for you application load is tricky with ZFS, since there are many hidden IO operations (besides the ones that your application is requesting) that ZFS must perform. If you collect iostats on bytes transferred to hard drives and compare those numbers to the amount of data your application(s) transferred you can find potentially large differences. The differences in these scenarios are largely driven by the IO size your application(s) use. For example, when I run the following tests here are my observations: -using dual xeon server with qlogic FC 2G interface -using a pool with 5 10Krpm FC 146 GB drives -sequentially writing 4 15GB previously wriiten files in one file system in the pool (this file system is using 128KB block size), and a separate thread writing each file concurrently for a total of 60GB written block size written actual written disk IO observed BW MB/S %CPU 4KB 60GB 227.3GB 34.2 20.4 32KB 60GB 216.5GB 36.1 13.9 128KB 60Gb 63.6GB 69.6 31.0 You can see that a small application IO size causes much meta-data based IO (more than 3 times the actual application IO requirements), while the 128KB application writes induce only marginally more disk IO than the application actually uses. the BW numbers here are for just the application data, but when you consider all the IO from the disks over the test times, the physical BW is obviously greater in all cases. All my drives were uniformly busy in these tests, but the small application IO sizes forced much more total IO against the drives. In your case the application IO rate would be even further degraded due to the mirror configuration. The extra load of reading and writing meta-data (including ditto-blocks) and mirror devices conspire to reduce the application IO rate, even though the disk device IO rates may be quite good. File system block size reduction only exacerbates the problem by requiring more meta-data to support the same quantity of application data, and for sequential IO this is a loser. In any case, for a non-raid pool, the allocation chunk size per drive (the stripe size) is not influenced by file system block size. When application IO sizes get small, the overhead in ZFS goes up dramatically. regards, Bill > The application is spending almost all the time > blocked on I/O. I see > that the number of device writes per second seems > pretty high. The > application is doing I/O in 128K blocks. How many > IOPS does a modern > 300GB 15K RPM SAS drive typically deliver? Of course > the IOPS > capacity depends on if the access is random or > sequential. At the > application level, the access is completely > sequential but ZFS is > likely doing some extra seeks. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss