Did the vendor say whether the patch is for hbase or some other component ?

Thanks

On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti <saad.mu...@gmail.com> wrote:

> Thanks for the feedback, so you guys are right the bucket cache is getting
> disabled due to too many I/O errors from the underlying files making up the
> bucket cache. Still do not know the exact underlying cause, but we are
> working with our vendor to test a patch they provided that seems to have
> resolved the issue for now. They say if it works out well they will
> eventually try to promote the patch to the open source versions.
>
> Cheers.
>
> ----
> Saad
>
>
> On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Here is related code for disabling bucket cache:
> >
> >     if (this.ioErrorStartTime > 0) {
> >
> >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > ioErrorsTolerationDuration) {
> >
> >         LOG.error("IO errors duration time has exceeded " +
> > ioErrorsTolerationDuration +
> >
> >           "ms, disabling cache, please check your IOEngine");
> >
> >         disableCache();
> >
> > Can you search in the region server log to see if the above occurred ?
> >
> > Was this server the only one with disabled cache ?
> >
> > Cheers
> >
> > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <saad.mu...@oath.com.invalid
> >
> > wrote:
> >
> > > HI,
> > >
> > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > configured to use two attached EBS disks of 50 GB each and I
> provisioned
> > > the bucket cache to be a bit less than the total, at a total of 98 GB
> per
> > > instance to be on the safe side. My tables have column families set to
> > > prefetch on open.
> > >
> > > On some instances during cluster startup, the bucket cache starts
> > throwing
> > > errors, and eventually the bucket cache gets completely disabled on
> this
> > > instance. The instance still stays up as a valid region server and the
> > only
> > > clue in the region server UI is that the bucket cache tab reports a
> count
> > > of 0, and size of 0 bytes.
> > >
> > > I have already opened a ticket with AWS to see if there are problems
> with
> > > the EBS volumes, but wanted to tap the open source community's
> hive-mind
> > to
> > > see what kind of problem would cause the bucket cache to get disabled.
> If
> > > the application depends on the bucket cache for performance, wouldn't
> it
> > be
> > > better to just remove that region server from the pool if its bucket
> > cache
> > > cannot be recovered/enabled?
> > >
> > > The error look like the following. Would appreciate any insight, thank:
> > >
> > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > bucket.BucketCache: Failed reading block
> > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > java.nio.channels.ClosedByInterruptException
> > >         at
> > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > AbstractInterruptibleChannel.java:202)
> > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > java:746)
> > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileReadAccessor.access(FileIOEngine.java:219)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > read(FileIOEngine.java:105)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > getBlock(BucketCache.java:492)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > getBlock(CombinedBlockCache.java:84)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > getCachedBlock(HFileReaderV2.java:279)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > HFileReaderV2.java:420)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > HFileReaderV2.java:209)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(
> > > ScheduledThreadPoolExecutor.java:293)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1149)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and
> > >
> > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > 16020-BucketCacheWriter-7]
> > > bucket.BucketCache: Failed writing to bucket cache
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileWriteAccessor.access(FileIOEngine.java:227)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > write(FileIOEngine.java:116)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:883)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and later
> > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4]
> > > bucket.BucketCach
> > > e: regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4
> > > exiting, cacheEnabled=false
> > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.FileIOEngi
> > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.BucketCach
> > > e: Failed syncing IO engine
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > sync(FileIOEngine.java:128)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:911)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ----
> > > Saad
> > >
> >
>

Reply via email to