I think it is for HBASE itself. But I'll have to wait for more details as they haven't shared the source code with us. I imagine they want to do a bunch more testing and other process stuff.
---- Saad On Wed, Feb 28, 2018 at 9:45 PM Ted Yu <[email protected]> wrote: > Did the vendor say whether the patch is for hbase or some other component ? > > Thanks > > On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti <[email protected]> wrote: > > > Thanks for the feedback, so you guys are right the bucket cache is > getting > > disabled due to too many I/O errors from the underlying files making up > the > > bucket cache. Still do not know the exact underlying cause, but we are > > working with our vendor to test a patch they provided that seems to have > > resolved the issue for now. They say if it works out well they will > > eventually try to promote the patch to the open source versions. > > > > Cheers. > > > > ---- > > Saad > > > > > > On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <[email protected]> wrote: > > > > > Here is related code for disabling bucket cache: > > > > > > if (this.ioErrorStartTime > 0) { > > > > > > if (cacheEnabled && (now - ioErrorStartTime) > this. > > > ioErrorsTolerationDuration) { > > > > > > LOG.error("IO errors duration time has exceeded " + > > > ioErrorsTolerationDuration + > > > > > > "ms, disabling cache, please check your IOEngine"); > > > > > > disableCache(); > > > > > > Can you search in the region server log to see if the above occurred ? > > > > > > Was this server the only one with disabled cache ? > > > > > > Cheers > > > > > > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti > <[email protected] > > > > > > wrote: > > > > > > > HI, > > > > > > > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is > > > > configured to use two attached EBS disks of 50 GB each and I > > provisioned > > > > the bucket cache to be a bit less than the total, at a total of 98 GB > > per > > > > instance to be on the safe side. My tables have column families set > to > > > > prefetch on open. > > > > > > > > On some instances during cluster startup, the bucket cache starts > > > throwing > > > > errors, and eventually the bucket cache gets completely disabled on > > this > > > > instance. The instance still stays up as a valid region server and > the > > > only > > > > clue in the region server UI is that the bucket cache tab reports a > > count > > > > of 0, and size of 0 bytes. > > > > > > > > I have already opened a ticket with AWS to see if there are problems > > with > > > > the EBS volumes, but wanted to tap the open source community's > > hive-mind > > > to > > > > see what kind of problem would cause the bucket cache to get > disabled. > > If > > > > the application depends on the bucket cache for performance, wouldn't > > it > > > be > > > > better to just remove that region server from the pool if its bucket > > > cache > > > > cannot be recovered/enabled? > > > > > > > > The error look like the following. Would appreciate any insight, > thank: > > > > > > > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057] > > > > bucket.BucketCache: Failed reading block > > > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache > > > > java.nio.channels.ClosedByInterruptException > > > > at > > > > java.nio.channels.spi.AbstractInterruptibleChannel.end( > > > > AbstractInterruptibleChannel.java:202) > > > > at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl. > > > > java:746) > > > > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$ > > > > FileReadAccessor.access(FileIOEngine.java:219) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > > > > accessFile(FileIOEngine.java:170) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > > > > read(FileIOEngine.java:105) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache. > > > > getBlock(BucketCache.java:492) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache. > > > > getBlock(CombinedBlockCache.java:84) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2. > > > > getCachedBlock(HFileReaderV2.java:279) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock( > > > > HFileReaderV2.java:420) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run( > > > > HFileReaderV2.java:209) > > > > at > > > > java.util.concurrent.Executors$RunnableAdapter. > > call(Executors.java:511) > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > at > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > > > at > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ > > > ScheduledFutureTask.run( > > > > ScheduledThreadPoolExecutor.java:293) > > > > at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > > > > ThreadPoolExecutor.java:1149) > > > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > > > > ThreadPoolExecutor.java:624) > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > > and > > > > > > > > 2018-02-25 01:12:52,432 ERROR [regionserver/ > > > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx: > > > > 16020-BucketCacheWriter-7] > > > > bucket.BucketCache: Failed writing to bucket cache > > > > java.nio.channels.ClosedChannelException > > > > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl. > > > java:110) > > > > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$ > > > > FileWriteAccessor.access(FileIOEngine.java:227) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > > > > accessFile(FileIOEngine.java:170) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > > > > write(FileIOEngine.java:116) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > > > > RAMQueueEntry.writeToCache(BucketCache.java:1357) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > > > WriterThread.doDrain( > > > > BucketCache.java:883) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > > > > WriterThread.run(BucketCache.java:838) > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > > and later > > > > 2018-02-25 01:13:47,783 INFO [regionserver/ > > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > > > > 194.246.70:16020-BucketCacheWriter-4] > > > > bucket.BucketCach > > > > e: regionserver/ > > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > > > > 194.246.70:16020-BucketCacheWriter-4 > > > > exiting, cacheEnabled=false > > > > 2018-02-25 01:13:47,864 WARN [regionserver/ > > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > > > > 194.246.70:16020-BucketCacheWriter-6] > > > > bucket.FileIOEngi > > > > ne: Failed syncing data to /mnt1/hbase/bucketcache > > > > 2018-02-25 01:13:47,864 ERROR [regionserver/ > > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10. > > > > 194.246.70:16020-BucketCacheWriter-6] > > > > bucket.BucketCach > > > > e: Failed syncing IO engine > > > > java.nio.channels.ClosedChannelException > > > > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl. > > > java:110) > > > > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine. > > > > sync(FileIOEngine.java:128) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > > > WriterThread.doDrain( > > > > BucketCache.java:911) > > > > at > > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$ > > > > WriterThread.run(BucketCache.java:838) > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > > ---- > > > > Saad > > > > > > > > > >
