Re: HBase Random Read latency > 100ms

Ramu M S Mon, 07 Oct 2013 02:36:22 -0700

Bharath,

I was about to report this. Yes indeed there is too much of GC time.
Just verified the GC time using Cloudera Manager statistics(Every minute
update).


For each Region Server,
 - During Read: Graph shows 2s constant.
 - During Compaction: Graph starts with 7s and goes as high as 20s during
end.

Few more questions,
1. For the current evaluation, since the reads are completely random and I
don't expect to read same data again can I set the Heap to the default 1 GB
?

2. Can I completely turn off BLOCK CACHE for this table?
    http://hbase.apache.org/book/regionserver.arch.html recommends that for
Randm reads.

3. But in the next phase of evaluation, We are interested to use HBase as
In-memory KV DB by having the latest data in RAM (To the tune of around 128
GB in each RS, we are setting up 50-100 Node Cluster). I am very curious to
hear any suggestions in this regard.

Regards,
Ramu


On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada
<[email protected]>wrote:

> Hi Ramu,
>
> Thanks for reporting the results back. Just curious if you are hitting any
> big GC pauses due to block cache churn on such large heap. Do you see it ?
>
> - Bharath
>
>
> On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <[email protected]> wrote:
>
> > Lars,
> >
> > After changing the BLOCKSIZE to 16KB, the latency has reduced a little.
> Now
> > the average is around 75ms.
> > Overall throughput (I am using 40 Clients to fetch records) is around 1K
> > OPS.
> >
> > After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in
> my 8
> > RS respectively.
> >
> > Thanks,
> > Ramu
> >
> >
> > On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <[email protected]> wrote:
> >
> > > Thanks Lars.
> > >
> > > I have changed the BLOCKSIZE to 16KB and triggered a major compaction.
> I
> > > will report my results once it is done.
> > >
> > > - Ramu
> > >
> > >
> > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <[email protected]>
> wrote:
> > >
> > >> First of: 128gb heap per RegionServer. Wow.I'd be interested to hear
> > your
> > >> experience with such a large heap for your RS. It's definitely big
> > enough.
> > >>
> > >>
> > >> It's interesting hat 100gb do fit into the aggregate cache (of
> 8x32gb),
> > >> while 1.8tb do not.
> > >> Looks like ~70% of the read request would need to bring in a 64kb
> block
> > >> in order to read 724 bytes.
> > >>
> > >> Should that take 100ms? No. Something's still amiss.
> > >>
> > >> Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k
> to
> > >> read the small row). You would need to issue a major compaction for
> > that to
> > >> take effect.
> > >> Maybe try 16k blocks. If that speeds up your random gets we know where
> > to
> > >> look next... At the disk IO.
> > >>
> > >>
> > >> -- Lars
> > >>
> > >>
> > >>
> > >> ________________________________
> > >>  From: Ramu M S <[email protected]>
> > >> To: [email protected]; lars hofhansl <[email protected]>
> > >> Sent: Sunday, October 6, 2013 11:05 PM
> > >> Subject: Re: HBase Random Read latency > 100ms
> > >>
> > >>
> > >> Lars,
> > >>
> > >> In one of your old posts, you had mentioned that lowering the
> BLOCKSIZE
> > is
> > >> good for random reads (of course with increased size for Block
> Indexes).
> > >>
> > >> Post is at
> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> > >>
> > >> Will that help in my tests? Should I give it a try? If I alter my
> table,
> > >> should I trigger a major compaction again for this to take effect?
> > >>
> > >> Thanks,
> > >> Ramu
> > >>
> > >>
> > >>
> > >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <[email protected]>
> wrote:
> > >>
> > >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64
> KB.
> > >> >
> > >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf',
> DATA_BLOCK_ENCODING
> > =>
> > >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS
> =>
> > >> '1',
> > >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> > >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>
> > >> 'false',
> > >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
> > >> >
> > >> > Thanks,
> > >> > Ramu
> > >> >
> > >> >
> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S <[email protected]>
> > wrote:
> > >> >
> > >> >> Lars,
> > >> >>
> > >> >> - Yes Short Circuit reading is enabled on both HDFS and HBase.
> > >> >> - I had issued Major compaction after table is loaded.
> > >> >> - Region Servers have max heap set as 128 GB. Block Cache Size is
> > 0.25
> > >> of
> > >> >> heap (So 32 GB for each Region Server) Do we need even more?
> > >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave it to
> > >> default?
> > >> >> - Keys are Zipfian distributed (By YCSB)
> > >> >>
> > >> >> Bharath,
> > >> >>
> > >> >> Bloom Filters are enabled. Here is my table details,
> > >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf',
> DATA_BLOCK_ENCODING
> > >> =>
> > >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS
> > =>
> > >> '1',
> > >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> > >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384', IN_MEMORY =>
> > >> 'false',
> > >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
> > >> >>
> > >> >> When the data size is around 100GB (100 Million records), then the
> > >> >> latency is very good. I am getting a throughput of around 300K OPS.
> > >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk
> reads
> > >> are
> > >> >> around 50-60 MB/s throughout the read cycle.
> > >> >>
> > >> >> Thanks,
> > >> >> Ramu
> > >> >>
> > >> >>
> > >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl <[email protected]>
> > >> wrote:
> > >> >>
> > >> >>> Have you enabled short circuit reading? See here:
> > >> >>> http://hbase.apache.org/book/perf.hdfs.html
> > >> >>>
> > >> >>> How's your data locality (shown on the RegionServer UI page).
> > >> >>>
> > >> >>>
> > >> >>> How much memory are you giving your RegionServers?
> > >> >>> If you reads are truly random and the data set does not fit into
> the
> > >> >>> aggregate cache, you'll be dominated by the disk and network.
> > >> >>> Each read would need to bring in a 64k (default) HFile block. If
> > short
> > >> >>> circuit reading is not enabled you'll get two or three context
> > >> switches.
> > >> >>>
> > >> >>> So I would try:
> > >> >>> 1. Enable short circuit reading
> > >> >>> 2. Increase the block cache size per RegionServer
> > >> >>> 3. Decrease the HFile block size
> > >> >>> 4. Make sure your data is local (if it is not, issue a major
> > >> compaction).
> > >> >>>
> > >> >>>
> > >> >>> -- Lars
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> ________________________________
> > >> >>>  From: Ramu M S <[email protected]>
> > >> >>> To: [email protected]
> > >> >>> Sent: Sunday, October 6, 2013 10:01 PM
> > >> >>> Subject: HBase Random Read latency > 100ms
> > >> >>>
> > >> >>>
> > >> >>> Hi All,
> > >> >>>
> > >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).
> > >> >>>
> > >> >>> Each Region Server is with the following configuration,
> > >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
> > >> >>> (Unfortunately configured with RAID 1, can't change this as the
> > >> Machines
> > >> >>> are leased temporarily for a month).
> > >> >>>
> > >> >>> I am running YCSB benchmark tests on HBase and currently inserting
> > >> around
> > >> >>> 1.8 Billion records.
> > >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)
> > >> >>>
> > >> >>> Currently I am getting a write throughput of around 100K OPS, but
> > >> random
> > >> >>> reads are very very slow, all gets have more than 100ms or more
> > >> latency.
> > >> >>>
> > >> >>> I have changed the following default configuration,
> > >> >>> 1. HFile Size: 16GB
> > >> >>> 2. HDFS Block Size: 512 MB
> > >> >>>
> > >> >>> Total Data size is around 1.8 TB (Excluding the replicas).
> > >> >>> My Table is split into 128 Regions (No pre-splitting used, started
> > >> with 1
> > >> >>> and grew to 128 over the insertion time)
> > >> >>>
> > >> >>> Taking some inputs from earlier discussions I have done the
> > following
> > >> >>> changes to disable Nagle (In both Client and Server
> hbase-site.xml,
> > >> >>> hdfs-site.xml)
> > >> >>>
> > >> >>> <property>
> > >> >>>   <name>hbase.ipc.client.tcpnodelay</name>
> > >> >>>   <value>true</value>
> > >> >>> </property>
> > >> >>>
> > >> >>> <property>
> > >> >>>   <name>ipc.server.tcpnodelay</name>
> > >> >>>   <value>true</value>
> > >> >>> </property>
> > >> >>>
> > >> >>> Ganglia stats shows large CPU IO wait (>30% during reads).
> > >> >>>
> > >> >>> I agree that disk configuration is not ideal for Hadoop cluster,
> but
> > >> as
> > >> >>> told earlier it can't change for now.
> > >> >>> I feel the latency is way beyond any reported results so far.
> > >> >>>
> > >> >>> Any pointers on what can be wrong?
> > >> >>>
> > >> >>> Thanks,
> > >> >>> Ramu
> > >> >>>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: HBase Random Read latency > 100ms

Reply via email to