Lars, After changing the BLOCKSIZE to 16KB, the latency has reduced a little. Now the average is around 75ms. Overall throughput (I am using 40 Clients to fetch records) is around 1K OPS.
After compaction hdfsBlocksLocalityIndex is 91,88,78,90,99,82,94,97 in my 8 RS respectively. Thanks, Ramu On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <[email protected]> wrote: > Thanks Lars. > > I have changed the BLOCKSIZE to 16KB and triggered a major compaction. I > will report my results once it is done. > > - Ramu > > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <[email protected]> wrote: > >> First of: 128gb heap per RegionServer. Wow.I'd be interested to hear your >> experience with such a large heap for your RS. It's definitely big enough. >> >> >> It's interesting hat 100gb do fit into the aggregate cache (of 8x32gb), >> while 1.8tb do not. >> Looks like ~70% of the read request would need to bring in a 64kb block >> in order to read 724 bytes. >> >> Should that take 100ms? No. Something's still amiss. >> >> Smaller blocks might help (you'd need to bring in 4, 8, or maybe 16k to >> read the small row). You would need to issue a major compaction for that to >> take effect. >> Maybe try 16k blocks. If that speeds up your random gets we know where to >> look next... At the disk IO. >> >> >> -- Lars >> >> >> >> ________________________________ >> From: Ramu M S <[email protected]> >> To: [email protected]; lars hofhansl <[email protected]> >> Sent: Sunday, October 6, 2013 11:05 PM >> Subject: Re: HBase Random Read latency > 100ms >> >> >> Lars, >> >> In one of your old posts, you had mentioned that lowering the BLOCKSIZE is >> good for random reads (of course with increased size for Block Indexes). >> >> Post is at http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow >> >> Will that help in my tests? Should I give it a try? If I alter my table, >> should I trigger a major compaction again for this to take effect? >> >> Thanks, >> Ramu >> >> >> >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <[email protected]> wrote: >> >> > Sorry BLOCKSIZE was wrong in my earlier post, it is the default 64 KB. >> > >> > {NAME => 'usertable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS => >> '1', >> > COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => >> 'false', >> > ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} >> > >> > Thanks, >> > Ramu >> > >> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S <[email protected]> wrote: >> > >> >> Lars, >> >> >> >> - Yes Short Circuit reading is enabled on both HDFS and HBase. >> >> - I had issued Major compaction after table is loaded. >> >> - Region Servers have max heap set as 128 GB. Block Cache Size is 0.25 >> of >> >> heap (So 32 GB for each Region Server) Do we need even more? >> >> - Decreasing HFile Size (Default is 1GB )? Should I leave it to >> default? >> >> - Keys are Zipfian distributed (By YCSB) >> >> >> >> Bharath, >> >> >> >> Bloom Filters are enabled. Here is my table details, >> >> {NAME => 'usertable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING >> => >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', VERSIONS => >> '1', >> >> COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '16384', IN_MEMORY => >> 'false', >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} >> >> >> >> When the data size is around 100GB (100 Million records), then the >> >> latency is very good. I am getting a throughput of around 300K OPS. >> >> In both cases (100 GB and 1.8 TB) Ganglia stats show that Disk reads >> are >> >> around 50-60 MB/s throughout the read cycle. >> >> >> >> Thanks, >> >> Ramu >> >> >> >> >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl <[email protected]> >> wrote: >> >> >> >>> Have you enabled short circuit reading? See here: >> >>> http://hbase.apache.org/book/perf.hdfs.html >> >>> >> >>> How's your data locality (shown on the RegionServer UI page). >> >>> >> >>> >> >>> How much memory are you giving your RegionServers? >> >>> If you reads are truly random and the data set does not fit into the >> >>> aggregate cache, you'll be dominated by the disk and network. >> >>> Each read would need to bring in a 64k (default) HFile block. If short >> >>> circuit reading is not enabled you'll get two or three context >> switches. >> >>> >> >>> So I would try: >> >>> 1. Enable short circuit reading >> >>> 2. Increase the block cache size per RegionServer >> >>> 3. Decrease the HFile block size >> >>> 4. Make sure your data is local (if it is not, issue a major >> compaction). >> >>> >> >>> >> >>> -- Lars >> >>> >> >>> >> >>> >> >>> ________________________________ >> >>> From: Ramu M S <[email protected]> >> >>> To: [email protected] >> >>> Sent: Sunday, October 6, 2013 10:01 PM >> >>> Subject: HBase Random Read latency > 100ms >> >>> >> >>> >> >>> Hi All, >> >>> >> >>> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). >> >>> >> >>> Each Region Server is with the following configuration, >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk >> >>> (Unfortunately configured with RAID 1, can't change this as the >> Machines >> >>> are leased temporarily for a month). >> >>> >> >>> I am running YCSB benchmark tests on HBase and currently inserting >> around >> >>> 1.8 Billion records. >> >>> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) >> >>> >> >>> Currently I am getting a write throughput of around 100K OPS, but >> random >> >>> reads are very very slow, all gets have more than 100ms or more >> latency. >> >>> >> >>> I have changed the following default configuration, >> >>> 1. HFile Size: 16GB >> >>> 2. HDFS Block Size: 512 MB >> >>> >> >>> Total Data size is around 1.8 TB (Excluding the replicas). >> >>> My Table is split into 128 Regions (No pre-splitting used, started >> with 1 >> >>> and grew to 128 over the insertion time) >> >>> >> >>> Taking some inputs from earlier discussions I have done the following >> >>> changes to disable Nagle (In both Client and Server hbase-site.xml, >> >>> hdfs-site.xml) >> >>> >> >>> <property> >> >>> <name>hbase.ipc.client.tcpnodelay</name> >> >>> <value>true</value> >> >>> </property> >> >>> >> >>> <property> >> >>> <name>ipc.server.tcpnodelay</name> >> >>> <value>true</value> >> >>> </property> >> >>> >> >>> Ganglia stats shows large CPU IO wait (>30% during reads). >> >>> >> >>> I agree that disk configuration is not ideal for Hadoop cluster, but >> as >> >>> told earlier it can't change for now. >> >>> I feel the latency is way beyond any reported results so far. >> >>> >> >>> Any pointers on what can be wrong? >> >>> >> >>> Thanks, >> >>> Ramu >> >>> >> >> >> >> >> > >> > >
