HI,
I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
configured to use two attached EBS disks of 50 GB each and I provisioned
the bucket cache to be a bit less than the total, at a total of 98 GB per
instance to be on the safe side. My tables have column families set to
prefetch on open.
On some instances during cluster startup, the bucket cache starts throwing
errors, and eventually the bucket cache gets completely disabled on this
instance. The instance still stays up as a valid region server and the only
clue in the region server UI is that the bucket cache tab reports a count
of 0, and size of 0 bytes.
I have already opened a ticket with AWS to see if there are problems with
the EBS volumes, but wanted to tap the open source community's hive-mind to
see what kind of problem would cause the bucket cache to get disabled. If
the application depends on the bucket cache for performance, wouldn't it be
better to just remove that region server from the pool if its bucket cache
cannot be recovered/enabled?
The error look like the following. Would appreciate any insight, thank:
2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
bucket.BucketCache: Failed reading block
332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:746)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$FileReadAccessor.access(FileIOEngine.java:219)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.accessFile(FileIOEngine.java:170)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.read(FileIOEngine.java:105)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:492)
at
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:84)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:279)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:420)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(HFileReaderV2.java:209)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
and
2018-02-25 01:12:52,432 ERROR [regionserver/
ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:16020-BucketCacheWriter-7]
bucket.BucketCache: Failed writing to bucket cache
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$FileWriteAccessor.access(FileIOEngine.java:227)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.accessFile(FileIOEngine.java:170)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.write(FileIOEngine.java:116)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1357)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:883)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:838)
at java.lang.Thread.run(Thread.java:748)
and later
2018-02-25 01:13:47,783 INFO [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-4]
bucket.BucketCach
e: regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-4
exiting, cacheEnabled=false
2018-02-25 01:13:47,864 WARN [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-6]
bucket.FileIOEngi
ne: Failed syncing data to /mnt1/hbase/bucketcache
2018-02-25 01:13:47,864 ERROR [regionserver/
ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.194.246.70:16020-BucketCacheWriter-6]
bucket.BucketCach
e: Failed syncing IO engine
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
at
org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.sync(FileIOEngine.java:128)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:911)
at
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:838)
at java.lang.Thread.run(Thread.java:748)
----
Saad