Here is related code for disabling bucket cache: if (this.ioErrorStartTime > 0) {
if (cacheEnabled && (now - ioErrorStartTime) > this. ioErrorsTolerationDuration) { LOG.error("IO errors duration time has exceeded " + ioErrorsTolerationDuration + "ms, disabling cache, please check your IOEngine"); disableCache(); Can you search in the region server log to see if the above occurred ? Was this server the only one with disabled cache ? Cheers On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <saad.mu...@oath.com.invalid> wrote: > HI, > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is > configured to use two attached EBS disks of 50 GB each and I provisioned > the bucket cache to be a bit less than the total, at a total of 98 GB per > instance to be on the safe side. My tables have column families set to > prefetch on open. > > On some instances during cluster startup, the bucket cache starts throwing > errors, and eventually the bucket cache gets completely disabled on this > instance. The instance still stays up as a valid region server and the only > clue in the region server UI is that the bucket cache tab reports a count > of 0, and size of 0 bytes. > > I have already opened a ticket with AWS to see if there are problems with > the EBS volumes, but wanted to tap the open source community's hive-mind to > see what kind of problem would cause the bucket cache to get disabled. If > the application depends on the bucket cache for performance, wouldn't it be > better to just remove that region server from the pool if its bucket cache > cannot be recovered/enabled? > > The error look like the following. Would appreciate any insight, thank: > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057] > bucket.BucketCache: Failed reading block > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end( > AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl. > java:746) > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$ > FileReadAccessor.access(FileIOEngine.java:219) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > accessFile(FileIOEngine.java:170) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > read(FileIOEngine.java:105) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache. > getBlock(BucketCache.java:492) > at > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache. > getBlock(CombinedBlockCache.java:84) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2. > getCachedBlock(HFileReaderV2.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock( > HFileReaderV2.java:420) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run( > HFileReaderV2.java:209) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > and > > 2018-02-25 01:12:52,432 ERROR [regionserver/ > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx: > 16020-BucketCacheWriter-7] > bucket.BucketCache: Failed writing to bucket cache > java.nio.channels.ClosedChannelException > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$ > FileWriteAccessor.access(FileIOEngine.java:227) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > accessFile(FileIOEngine.java:170) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > write(FileIOEngine.java:116) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > RAMQueueEntry.writeToCache(BucketCache.java:1357) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain( > BucketCache.java:883) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > WriterThread.run(BucketCache.java:838) > at java.lang.Thread.run(Thread.java:748) > > and later > 2018-02-25 01:13:47,783 INFO [regionserver/ > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > 194.246.70:16020-BucketCacheWriter-4] > bucket.BucketCach > e: regionserver/ > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > 194.246.70:16020-BucketCacheWriter-4 > exiting, cacheEnabled=false > 2018-02-25 01:13:47,864 WARN [regionserver/ > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > 194.246.70:16020-BucketCacheWriter-6] > bucket.FileIOEngi > ne: Failed syncing data to /mnt1/hbase/bucketcache > 2018-02-25 01:13:47,864 ERROR [regionserver/ > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > 194.246.70:16020-BucketCacheWriter-6] > bucket.BucketCach > e: Failed syncing IO engine > java.nio.channels.ClosedChannelException > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379) > at > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > sync(FileIOEngine.java:128) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain( > BucketCache.java:911) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > WriterThread.run(BucketCache.java:838) > at java.lang.Thread.run(Thread.java:748) > > ---- > Saad >