Hi Otis, Excellent reflexion, unfortunately I don't think anyone benchmarked it to give a definitive answer.
One thing I'm sure of is that worse than screwing up the OS cache, it also screws up the block cache! But this is the price to pay to clear up old versions and regroup all store files into 1. If you're not deleting a whole lot, or updating the same fields a ton, then maybe you should explore setting a larger window between each major compaction (current being once every 24h). I know some people just plain disable major compactions because they are never overwriting values. J-D On Wed, Feb 16, 2011 at 4:30 AM, Otis Gospodnetic <[email protected]> wrote: > Hi, > > Over on http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html I > saw > this bit: > > "The most important factor is that HBase is not restarted frequently and that > it > > performs house keeping on a regular basis. These so called compactions rewrite > files as new data is added over time. All files in HDFS once written are > immutable (for all sorts of reasons). Because of that, data is written into > new files and as their number grows HBase compacts them into another set of > new, consolidated files. And here is the kicker: HDFS is smart enough to put > the data where it is needed!" > > ... and I always wondered what this does to the OS cache. In some > applications > (non-HBase stuff, say full-text search), the OS cache plays a crucial role in > how the system performs. If you have to hit the disk too much, you're in > trouble, so one of the things you avoid is making big changes to index files > on > disk in order to avoid invalidating data that's been nicely cached by the OS. > > However, with HBase, and especially major compactions, what happens with the > OS > cache? All gone, right? > Do people find this problematic? > Or does the OS cache simply not play such a significant role in systems > running > HBase simply because the data it holds and that needs to be accessed is much > bigger than the OS cache could ever be, so even with the OS cache full and > hot, > other data would still have to be read from disk anyway? > > Thanks, > Otis >
