Bob Friesenhahn wrote:
On Wed, 17 Jun 2009, Haudy Kazemi wrote:
usable with very little CPU consumed.
If the system is dedicated to serving files rather than also being used interactively, it should not matter much what the CPU usage is. CPU cycles can't be stored for later use. Ultimately, it (mostly*) does not matter if

Clearly you have not heard of the software flywheel:

  http://www.simplesystems.org/users/bfriesen/software_flywheel.html
I had not heard of such a device, however from the description it appears to be made from virtual unobtanium.... :)

My line of reasoning is that unused CPU cycles are to some extent a wasted resource, paralleling the idea that having system RAM sitting empty/unused is also a waste and should be used for caching until the system needs that RAM for other purposes (how the ZFS cache is supposed to work). This isn't a perfect parallel as CPU power consumption and heat outlet do vary by load much more than does RAM. I'm sure someone could come up with a formula for the optimal CPU loading to maximize energy efficiency. There has been work on this the paper 'Dynamic Data Compression in Multi-hop Wireless Networks' at http://enl.usc.edu/~abhishek/sigmpf03-sharma.pdf .

If I understand the blog entry correctly, for text data the task took up to 3.5X longer to complete, and for media data, the task took about 2.2X longer to complete with a maximum storage compression ratio of 2.52X.

For my backup drive using lzjb compression I see a compression ratio of only 1.53x.

I linked to several blog posts. It sounds like you are referring to ' http://blogs.sun.com/dap/entry/zfs_compression#comments '? This blog's test results show that on their quad core platform (Sun 7410 have quad core 2.3 ghz AMD Opteron cpus*) : * http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/7410/spec

for text data, LZJB compression had negligible performance benefits (task times were unchanged or marginally better) and less storage space was consumed (1.47:1). for media data, LZJB compression had negligible performance benefits (task times were unchanged or marginally worse) and storage space consumed was unchanged (1:1). Take away message: as currently configured, their system has nothing to lose from enabling LZJB.

for text data, GZIP compression at any setting, had a significant negative impact on write times (CPU bound), no performance impact on read times, and significant positive improvements in compression ratio. for media data, GZIP compression at any setting, had a significant negative impact on write times (CPU bound), no performance impact on read times, and marginal improvements in compression ratio. Take away message: With GZIP as their system is currently configured, write performance would suffer in exchange for a higher compression ratio. This may be acceptable if the system fulfills a role that has a read heavy usage profile of compressible content. (An archive.org backend would be such an example.) This is similar to the tradeoff made when comparing RAID1 or RAID10 vs RAID5.

Automatic benchmarks could be used to detect and select the optimal compression settings for best performance, with the basic case assuming the system is a dedicated file server and more advanced cases accounting for the CPU needs of other processes run on the same platform. Another way would be to ask the administrator what the usage profile for the machine will be and preconfigure compression settings suitable for that use case.

Single and dual core systems are more likely to become CPU bound from enabling compression than a quad core.

All systems have bottlenecks in them somewhere by virtue of design decisions. One or more of these bottlenecks will be the rate limiting factor for any given workload, such that even if you speed up the rest of the system the process will still take the same amount of time to complete. The LZJB compression benchmarks on the quad core above demonstrate that LZJB is not the rate limiter either in writes or reads. The GZIP benchmarks show that it is a rate limiter, but only during writes. On a more powerful platform (6x faster CPU), GZIP writes may no longer be the bottleneck (assuming that the network bandwidth and drive I/O bandwidth remain unchanged).

System component balancing also plays a role. If the server is connected via a 100 Mbps CAT5e link, and all I/O activity is from client computers on that link, does it make any difference if the server is actually capable of GZIP writes at 200 Mbps, 500 Mbps, or 1500 Mbps? If the network link is later upgraded to Gigabit ethernet, now only the system capable of GZIPing at 1500 Mbps can keep up. The rate limiting factor changes as different components are upgraded.

In many systems for many workloads, hard drive I/O bandwidth is the rate limiting factor that has the most significant performance impact, such that a 20% boost in drive I/O is more noticeable than a 20% boost in CPU performance (or even a doubling of CPU performance). Many systems are now becoming quite unbalanced in terms of I/O bandwidth vs CPU performance. Trading CPU cycles for I/O bandwidth is one way of compensating for the imbalance, if the task is not already CPU-bound. (A CPU-bound process has the CPU as the rate-limiting factor. A common characteristic of CPU-bound processes is they run the CPU at 100%, and would benefit from a faster processor. Non CPU-bound processes have a different rate-limiting factor which remains unchanged even if a faster CPU is used. An example of a non CPU-bound process is MP3 decoding for live playback. An example of balancing a system is to compare a recent netbook to a stock configuration Pentium 3 laptop from 2002. They both have CPUs of similar capability but the netbooks come with more RAM and some with flash memory rather than hard drives. The performance boost from extra RAM and flash memory storage helps compensate for what by 2009 standards are slow CPUs. As a result, the netbooks tend to have a better balance of CPU/RAM/permanent storage capacity and performance than the stock configuration Pentium 3 laptops (an upgraded ultraportable Pentium 3 laptop can match a netbook quite well).



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to