On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran <rahu...@yahoo.com> wrote:

> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
> am missing something which you could guide me towards. Thanks in advance.
>
> We are currently writing data and reading in an almost continuous mode
> (stream of data written into an HBase table and then we run a time-based MR
> on top of this Table). We currently were backed up and about 1.5 TB of data
> was loaded into the table and we began performing time-based scan MRs in 10
> minute time intervals(startTime and endTime interval is 10 minutes). Most
> of the 10 minute interval had about 100 GB of data to process.
>
> Our workflow was to primarily eliminate duplicates from this table. We
> have  maxVersions = 5 for the table. We use TableInputFormat to perform the
> time-based scan to ensure data locality. In the mapper, we check if there
> exists a previous version of the row in a time period earlier to the
> timestamp of the input row. If not, we emit that row.
>
> We looked at https://issues.apache.org/jira/browse/HBASE-4683 and hence
> turned off block cache for this table with the expectation that the block
> index and bloom filter will be cached in the block cache. We expect
> duplicates to be rare and hence hope for most of these checks to be
> fulfilled by the bloom filter. Unfortunately, we notice very slow
> performance on account of being disk bound. Looking at jstack, we notice
> that most of the time, we appear to be hitting disk for the block index. We
> performed a major compaction and retried and performance improved some, but
> not by much. We are processing data at about 2 MB per second.
>
>   We are using CDH 4.2.1 HBase 0.94.2 and HDFS 2.0.0 running with 8
> datanodes/regionservers(each with 32 cores, 4x1TB disks and 60 GB RAM).

Anil: You dont have the right balance between disk,cpu and ram. You have
too much of CPU, RAM but very less NUMBER of disks. Usually, its better to
have a Disk/Cpu_core ratio near 0.6-0.8. Your's is around 0.13. This seems
to be the biggest reason of your problem.

> HBase is running with 30 GB Heap size, memstore values being capped at 3
> GB and flush thresholds being 0.15 and 0.2. Blockcache is at 0.5 of total
> heap size(15 GB). We are using SNAPPY for our tables.
>
>
> A couple of questions:
>         * Is the performance of the time-based scan bad after a major
> compaction?
>
Anil: In general, TimeBased(i am assuming you have built your rowkey on
timestamp) scans are not good for HBase because of region hot-spotting.
Have you tried setting the ScannerCaching to a higher number?

>
>         * What can we do to help alleviate being disk bound? The typical
> answer of adding more RAM does not seem to have helped, or we are missing
> some other config
>
Anil: Try adding more disks to your machines.

>
>
>
> Below are some of the metrics from a Regionserver webUI:
>
> requestsPerSecond=5895, numberOfOnlineRegions=60, numberOfStores=60,
> numberOfStorefiles=209, storefileIndexSizeMB=6, rootIndexSizeKB=7131,
> totalStaticIndexSizeKB=415995, totalStaticBloomSizeKB=2514675,
> memstoreSizeMB=0, mbInMemoryWithoutWAL=0, numberOfPutsWithoutWAL=0,
> readRequestsCount=30589690, writeRequestsCount=0, compactionQueueSize=0,
> flushQueueSize=0, usedHeapMB=2688, maxHeapMB=30672,
> blockCacheSizeMB=1604.86, blockCacheFreeMB=13731.24, blockCacheCount=11817,
> blockCacheHitCount=27592222, blockCacheMissCount=25373411,
> blockCacheEvictedCount=7112, blockCacheHitRatio=52%,
> blockCacheHitCachingRatio=72%, hdfsBlocksLocalityIndex=91,
> slowHLogAppendCount=0, fsReadLatencyHistogramMean=15409428.56,
> fsReadLatencyHistogramCount=1559927, fsReadLatencyHistogramMedian=230609.5,
> fsReadLatencyHistogram75th=280094.75, fsReadLatencyHistogram95th=9574280.4,
> fsReadLatencyHistogram99th=100981301.2,
> fsReadLatencyHistogram999th=511591146.03,
>  fsPreadLatencyHistogramMean=3895616.6,
> fsPreadLatencyHistogramCount=420000, fsPreadLatencyHistogramMedian=954552,
> fsPreadLatencyHistogram75th=8723662.5,
> fsPreadLatencyHistogram95th=11159637.65,
> fsPreadLatencyHistogram99th=37763281.57,
> fsPreadLatencyHistogram999th=273192813.91,
> fsWriteLatencyHistogramMean=6124343.91,
> fsWriteLatencyHistogramCount=1140000, fsWriteLatencyHistogramMedian=374379,
> fsWriteLatencyHistogram75th=431395.75,
> fsWriteLatencyHistogram95th=576853.8,
> fsWriteLatencyHistogram99th=1034159.75,
> fsWriteLatencyHistogram999th=5687910.29
>
>
>
> key size: 20 bytes
>
> Table description:
> {NAME => 'foo', FAMILIES => [{NAME => 'f', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFI true
>  LTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'SNAPPY',
> VERSIONS => '5', TTL => '
>  2592000', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
> => '65536', ENCODE_
>  ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}




-- 
Thanks & Regards,
Anil Gupta

Reply via email to