Adding to what Lars said, you can enable bloom filters on column families for read performance.
On Mon, Oct 7, 2013 at 10:51 AM, lars hofhansl <[email protected]> wrote: > Have you enabled short circuit reading? See here: > http://hbase.apache.org/book/perf.hdfs.html > > How's your data locality (shown on the RegionServer UI page). > > > How much memory are you giving your RegionServers? > If you reads are truly random and the data set does not fit into the > aggregate cache, you'll be dominated by the disk and network. > Each read would need to bring in a 64k (default) HFile block. If short > circuit reading is not enabled you'll get two or three context switches. > > So I would try: > 1. Enable short circuit reading > 2. Increase the block cache size per RegionServer > 3. Decrease the HFile block size > 4. Make sure your data is local (if it is not, issue a major compaction). > > > -- Lars > > > > ________________________________ > From: Ramu M S <[email protected]> > To: [email protected] > Sent: Sunday, October 6, 2013 10:01 PM > Subject: HBase Random Read latency > 100ms > > > Hi All, > > My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6). > > Each Region Server is with the following configuration, > 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk > (Unfortunately configured with RAID 1, can't change this as the Machines > are leased temporarily for a month). > > I am running YCSB benchmark tests on HBase and currently inserting around > 1.8 Billion records. > (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record) > > Currently I am getting a write throughput of around 100K OPS, but random > reads are very very slow, all gets have more than 100ms or more latency. > > I have changed the following default configuration, > 1. HFile Size: 16GB > 2. HDFS Block Size: 512 MB > > Total Data size is around 1.8 TB (Excluding the replicas). > My Table is split into 128 Regions (No pre-splitting used, started with 1 > and grew to 128 over the insertion time) > > Taking some inputs from earlier discussions I have done the following > changes to disable Nagle (In both Client and Server hbase-site.xml, > hdfs-site.xml) > > <property> > <name>hbase.ipc.client.tcpnodelay</name> > <value>true</value> > </property> > > <property> > <name>ipc.server.tcpnodelay</name> > <value>true</value> > </property> > > Ganglia stats shows large CPU IO wait (>30% during reads). > > I agree that disk configuration is not ideal for Hadoop cluster, but as > told earlier it can't change for now. > I feel the latency is way beyond any reported results so far. > > Any pointers on what can be wrong? > > Thanks, > Ramu > -- Bharath Vissapragada <http://www.cloudera.com>
