hi Stack, thanks for reply.
*hbase version* is 0.90.2. we use ganglia to monitor our cluster.write/read
is normal/equally distributed all day long. 1k write , 4k read.
it's kinda impossible to upgrade at  current moment , and we dont have
extra machines to cope with this situation.(migrate=>upgrade)
those metrics are region server metrics.high load time is  about 10:30am  ~
11:00 am (I'm from China) today.
all servers in the cluster are dedicated for hbase storage and we dont have
any other jobs or programs running on those servers.

I found that I have one major compaction going on that period,is it the
main reason of high load situation?
and why does  this " Block cache LRU eviction " happen so frequently?
#org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
started; Attempting to free 634.89 MB of total=5.27 GB

*10:30's major compaction log*:
2012-09-26 10:31:36,581 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Major compaction triggered on store data; time since last major compaction
81842121ms
2012-09-26 10:31:36,581 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for
es_account_info_change_log,cntaobao\xE6\x99\x93\xE6\x96\x87\xE5\x8D\x96\xE9\x9E\x8B\xE5\xAD\x90+cntaobao\xE6\x99\x93\xE6\x96\x87\xE5\x8D\x96\xE9\x9E\x8B\xE5\xAD\x90+3+2012-03-12
00:42:46,1347263896089.125ca5cdc71431202022d57edaea594c. because
regionserver60020.majorCompactionChecker requests major compaction use
default priority; priority=18, compaction queue size=19
2012-09-26 10:31:36,583 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Skipping major compaction of info because one (major) compacted file only
and oldestTime 4256527210ms is < ttl=9223372036854775807
2012-09-26 10:31:36,586 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Skipping major compaction of info because one (major) compacted file only
and oldestTime 4861173609ms is < ttl=9223372036854775807
2012-09-26 10:31:36,591 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Skipping major compaction of F0 because one (major) compacted file only and
oldestTime 34018941951ms is < ttl=9223372036854775807


thanks ,regards.





On Wed, Sep 26, 2012 at 12:16 PM, Stack <[email protected]> wrote:

> On Tue, Sep 25, 2012 at 9:02 PM, Yusup Ashrap <[email protected]> wrote:
> > Hi Otis  thanks for reply,
> > servers are identical in terms of hardware, jvm.
> > right now I cannot afford to restart my any machines, it's in the
> > production environment :D.
> > I will give a shot for some other clusters some time later.
> >
>
> What about the other questions Otis asked about what your monitoring
> software shows is going on on the cluster (opentsdb, ganglia, Otis's
> suggested SPM, etc.)?
>
> Is that hbase 0.90.2 or 0.92.0?
>
> What metrics do you paste?  A master or a regionserver?  At what time?
>
> When it is load 30, anything else running?  A mapreduce job?  Any
> other process?  A cron?
>
> What is the loading like?  It looks like you are taking writes and
> then you need to flush a bunch of regions because you are carrying too
> many WALs.
>
> You are flushing lots of small files.  Are you doing a lot of
> compacting when the load is high?  Are you using all defaults?
>
> St.Ack
>

Reply via email to