hi Stack, thanks for reply. *hbase version* is 0.90.2. we use ganglia to monitor our cluster.write/read is normal/equally distributed all day long. 1k write , 4k read. it's kinda impossible to upgrade at current moment , and we dont have extra machines to cope with this situation.(migrate=>upgrade) those metrics are region server metrics.high load time is about 10:30am ~ 11:00 am (I'm from China) today. all servers in the cluster are dedicated for hbase storage and we dont have any other jobs or programs running on those servers.
I found that I have one major compaction going on that period,is it the main reason of high load situation? and why does this " Block cache LRU eviction " happen so frequently? #org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 634.89 MB of total=5.27 GB *10:30's major compaction log*: 2012-09-26 10:31:36,581 DEBUG org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered on store data; time since last major compaction 81842121ms 2012-09-26 10:31:36,581 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for es_account_info_change_log,cntaobao\xE6\x99\x93\xE6\x96\x87\xE5\x8D\x96\xE9\x9E\x8B\xE5\xAD\x90+cntaobao\xE6\x99\x93\xE6\x96\x87\xE5\x8D\x96\xE9\x9E\x8B\xE5\xAD\x90+3+2012-03-12 00:42:46,1347263896089.125ca5cdc71431202022d57edaea594c. because regionserver60020.majorCompactionChecker requests major compaction use default priority; priority=18, compaction queue size=19 2012-09-26 10:31:36,583 DEBUG org.apache.hadoop.hbase.regionserver.Store: Skipping major compaction of info because one (major) compacted file only and oldestTime 4256527210ms is < ttl=9223372036854775807 2012-09-26 10:31:36,586 DEBUG org.apache.hadoop.hbase.regionserver.Store: Skipping major compaction of info because one (major) compacted file only and oldestTime 4861173609ms is < ttl=9223372036854775807 2012-09-26 10:31:36,591 DEBUG org.apache.hadoop.hbase.regionserver.Store: Skipping major compaction of F0 because one (major) compacted file only and oldestTime 34018941951ms is < ttl=9223372036854775807 thanks ,regards. On Wed, Sep 26, 2012 at 12:16 PM, Stack <[email protected]> wrote: > On Tue, Sep 25, 2012 at 9:02 PM, Yusup Ashrap <[email protected]> wrote: > > Hi Otis thanks for reply, > > servers are identical in terms of hardware, jvm. > > right now I cannot afford to restart my any machines, it's in the > > production environment :D. > > I will give a shot for some other clusters some time later. > > > > What about the other questions Otis asked about what your monitoring > software shows is going on on the cluster (opentsdb, ganglia, Otis's > suggested SPM, etc.)? > > Is that hbase 0.90.2 or 0.92.0? > > What metrics do you paste? A master or a regionserver? At what time? > > When it is load 30, anything else running? A mapreduce job? Any > other process? A cron? > > What is the loading like? It looks like you are taking writes and > then you need to flush a bunch of regions because you are carrying too > many WALs. > > You are flushing lots of small files. Are you doing a lot of > compacting when the load is high? Are you using all defaults? > > St.Ack >
