Ha, so I already set "hbase.wal.provider" to "filesystem", but didn't figure out to set "hbase.wal.meta_provider" to "filesystem" as well. Sean, I'm guessing this was the reason master got stuck assigning meta region. I had this in the logs of regionserver-3, if it's helpful:
18/07/02 22:02:21 INFO regionserver.RSRpcServices: Open hbase:meta,,1.1588230740 18/07/02 22:02:21 INFO regionserver.RSRpcServices: Receiving OPEN for the region:hbase:meta,,1.1588230740, which we are already trying to OPEN - ignoring this new request for this region. Now everything is up an running. Thank you for the help! On Fri, Jul 6, 2018 at 8:57 AM, Stack <[email protected]> wrote: > Hey Andrey: > > Testing 2.0.0, I ran against 2.7.x and 2.8.3. I just went back to my test > cluster and upgraded to 2.8.4 and indeed, I see master stuck initializing > waiting on the assign of hbase:meta. > > > 2018-07-06 08:39:21,787 INFO [PEWorker-10] procedure. > RecoverMetaProcedure: > pid=5, state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure > failedMetaServer=null, splitWal=true; Retaining meta assignment to server= > ve0538.X.Y.Z.com,16020,1530891551115 > 2018-07-06 08:39:21,789 INFO [PEWorker-10] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=6, ppid=5, > state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, > region=1588230740, target=ve0538.X.Y.Za.com,16020,1530891551115}] > 2018-07-06 08:39:21,847 INFO [PEWorker-4] > procedure.MasterProcedureScheduler: pid=6, ppid=5, > state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, > region=1588230740, target=ve0538.X.Y.Z.com,16020,1530891551115 checking > lock on 1588230740 > > When I go to the RegionServer that was assigned hbase:meta and look at its > logs, I see this: > > 479474 2018-07-06 08:28:18,304 ERROR [RS-EventLoopGroup-1-7] > asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper: Couldn't properly > initialize access to HDFS internals. Please update your WAL Provider > to not make use of the 'asyncfs' provider. See HBASE-16110 for more > information. > 479475 java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryption > Key(org.apache.hadoop.fs.FileEncryptionInfo) > 479476 at java.lang.Class.getDeclaredMethod(Class.java:2130) > 479477 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper. > createTransparentCryptoHelper(FanOutOneBlockAsyncDFSOutputSa > slHelper.java:232) > 479478 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSa > slHelper.<clinit>(FanOutOneBlockAsyncDFSOutputSaslHelper.java:262) > 479479 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:661) > 479480 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:118) > 479481 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:720) > 479482 at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:715) > 479483 at > org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise. > notifyListener0(DefaultPromise.java:507) > 479484 at > org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise. > notifyListenersNow(DefaultPromise.java:481) > 479485 at > org.apache.hbase.thirdparty.io.netty.util.concurrent. > DefaultPromise.access$000(DefaultPromise.java:34) > 479486 at > org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise$1.run( > DefaultPromise.java:431) > 479487 at > org.apache.hbase.thirdparty.io.netty.util.concurrent. > AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > 479488 at > org.apache.hbase.thirdparty.io.netty.util.concurrent. > SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) > 479489 at > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run( > EpollEventLoop.java:309) > 479490 at > org.apache.hbase.thirdparty.io.netty.util.concurrent. > SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > 479491 at > org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$ > DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > 479492 at java.lang.Thread.run(Thread.java:748) > > Do you see the above? > > Setting the WAL writer back to FSHLog got me going again. I added the below > to the config: > > > <property> > <name> > hbase.wal.provider > </name> > <value>filesystem</value> > </property> > <property> > <name> > hbase.wal.meta_provider > </name> > <value>filesystem</value> > </property> > > > > St.Ack > > > > > On Thu, Jul 5, 2018 at 12:39 PM Andrey Elenskiy > <[email protected]> wrote: > > > > Are there any ERROR messages in the regionservers or the master logs? > > > > Hey Sean, nothing interesting in master logs, it's just stuck > initializing > > and throws 500 when trying to access via web ui: > > https://pastebin.com/mHsyhdNs > > Logs of one of the region server (sorry had to restart, but I'm fairly > > certain there were no ERRORs): https://pastebin.com/wHHVdQgH > > > > FYI, hbase 2.0.1 was working without issues with hbase 2.7.5. It's 2.8.4 > > that's giving trouble and we can't go back as hdfs file format changed. > > > > > OK it is HDFS-12574, it has also been ported to 2.8.4. Let's revive > > HBASE-20244. > > > > Ha, thanks! I'll give it a try when 2.0.2 comes out. > > > > On Mon, Jul 2, 2018 at 6:10 PM, 张铎(Duo Zhang) <[email protected]> > > wrote: > > > > > OK it is HDFS-12574, it has also been ported to 2.8.4. Let's > > > revive HBASE-20244. > > > > > > 2018-07-03 9:07 GMT+08:00 张铎(Duo Zhang) <[email protected]>: > > > > > > > I think it is fine to just use the original hadoop jars in > HBase-2.0.1 > > to > > > > communicate with HDFS-2.8.4 or above? > > > > > > > > The async wal has hacked into the internal of DFSClient so it will be > > > > easily broken when HDFS upgraded. > > > > > > > > I can take a look at the 2.8.4 problem but for 3.x, there is no > > > production > > > > ready release yet so there is no plan to fix it yet. > > > > > > > > 2018-07-03 8:59 GMT+08:00 Sean Busbey <[email protected]>: > > > > > > > >> That's just a warning. Checking on HDFS-11644, it's only present in > > > >> Hadoop 2.9+ so seeing a lack of it with HDFS in 2.8.4 is expected. > > > >> (Presuming you are deploying on top of HDFS and not e.g. > > > >> LocalFileSystem.) > > > >> > > > >> Are there any ERROR messages in the regionservers or the master > logs? > > > >> Could you post them somewhere and provide a link here? > > > >> > > > >> On Mon, Jul 2, 2018 at 5:11 PM, Andrey Elenskiy > > > >> <[email protected]> wrote: > > > >> > It's now stuck at Master Initializing and regionservers are > > > complaining > > > >> > with: > > > >> > > > > >> > 18/07/02 21:12:20 WARN util.CommonFSUtils: Your Hadoop > installation > > > does > > > >> > not include the StreamCapabilities class from HDFS-11644, so we > will > > > >> skip > > > >> > checking if any FSDataOutputStreams actually support hflush/hsync. > > If > > > >> you > > > >> > are running on top of HDFS this probably just means you have an > > older > > > >> > version and this can be ignored. If you are running on top of an > > > >> alternate > > > >> > FileSystem implementation you should manually verify that hflush > and > > > >> hsync > > > >> > are implemented; otherwise you risk data loss and hard to diagnose > > > >> errors > > > >> > when our assumptions are violated. > > > >> > > > > >> > I'm guessing hbase 2.0.1 on top of 2.8.4 hasn't been ironed out > > > >> completely > > > >> > yet (at least not with stock hadoop jars) unless I'm missing > > > something. > > > >> > > > > >> > On Mon, Jul 2, 2018 at 3:02 PM, Mich Talebzadeh < > > > >> [email protected]> > > > >> > wrote: > > > >> > > > > >> >> You are lucky that HBASE 2.0.1 worked with Hadoop 2.8 > > > >> >> > > > >> >> I tried HBASE 2.0.1 with Hadoop 3.1 and there was endless > problems > > > >> with the > > > >> >> Region server crashing because WAL file system issue. > > > >> >> > > > >> >> thread - Hbase hbase-2.0.1, region server does not start on > Hadoop > > > 3.1 > > > >> >> > > > >> >> Decided to roll back to Hbase 1.2.6 that works with Hadoop 3.1 > > > >> >> > > > >> >> HTH > > > >> >> > > > >> >> Dr Mich Talebzadeh > > > >> >> > > > >> >> > > > >> >> > > > >> >> LinkedIn * https://www.linkedin.com/profile/view?id= > > > >> >> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrb > > > >> Jd6zP6AcPCCd > > > >> >> OABUrV8Pw>* > > > >> >> > > > >> >> > > > >> >> > > > >> >> http://talebzadehmich.wordpress.com > > > >> >> > > > >> >> > > > >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility > > for > > > >> any > > > >> >> loss, damage or destruction of data or any other property which > may > > > >> arise > > > >> >> from relying on this email's technical content is explicitly > > > >> disclaimed. > > > >> >> The author will in no case be liable for any monetary damages > > arising > > > >> from > > > >> >> such loss, damage or destruction. > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> On Mon, 2 Jul 2018 at 22:43, Andrey Elenskiy > > > >> >> <[email protected]> wrote: > > > >> >> > > > >> >> > <property> > > > >> >> > <name>hbase.wal.provider</name> > > > >> >> > <value>filesystem</value> > > > >> >> > </property> > > > >> >> > > > > >> >> > Seems to fix it, but would be nice to actually try the fanout > wal > > > >> with > > > >> >> > hadoop 2.8.4. > > > >> >> > > > > >> >> > On Mon, Jul 2, 2018 at 1:03 PM, Andrey Elenskiy < > > > >> >> > [email protected]> > > > >> >> > wrote: > > > >> >> > > > > >> >> > > Hello, we are running HBase 2.0.1 with official Hadoop 2.8.4 > > jars > > > >> and > > > >> >> > > hadoop 2.8.4 client (http://central.maven.org/ > > > >> >> maven2/org/apache/hadoop/ > > > >> >> > > hadoop-client/2.8.4/). Got the following exception on > > > regionserver > > > >> >> which > > > >> >> > > brings it down: > > > >> >> > > > > > >> >> > > 18/07/02 18:51:06 WARN concurrent.DefaultPromise: An > exception > > > was > > > >> >> > thrown by org.apache.hadoop.hbase.io > > > >> >> > > > .asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete() > > > >> >> > > java.lang.Error: Couldn't properly initialize access to HDFS > > > >> internals. > > > >> >> > Please update your WAL Provider to not make use of the > 'asyncfs' > > > >> >> provider. > > > >> >> > See HBASE-16110 for more information. > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.<clinit>( > > > >> >> FanOutOneBlockAsyncDFSOutputSaslHelper.java:268) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize( > > > >> >> FanOutOneBlockAsyncDFSOutputHelper.java:661) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300( > > > >> >> FanOutOneBlockAsyncDFSOutputHelper.java:118) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper$13.operationComplete( > > > >> >> FanOutOneBlockAsyncDFSOutputHelper.java:720) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputHe > lper$13.operationComplete( > > > >> >> FanOutOneBlockAsyncDFSOutputHelper.java:715) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > DefaultPromise. > > > >> >> notifyListener0(DefaultPromise.java:507) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > DefaultPromise. > > > >> >> notifyListeners0(DefaultPromise.java:500) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > DefaultPromise. > > > >> >> notifyListenersNow(DefaultPromise.java:479) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > DefaultPromise. > > > >> >> notifyListeners(DefaultPromise.java:420) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > >> >> DefaultPromise.trySuccess(DefaultPromise.java:104) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel. > > > DefaultChannelPromise. > > > >> >> trySuccess(DefaultChannelPromise.java:82) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractE > > > >> pollChannel$ > > > >> >> AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChann > > > >> el.java:638) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractE > > > >> pollChannel$ > > > >> >> AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:676) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractE > > > >> pollChannel$ > > > >> >> AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:552) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel.epoll. > > > >> >> EpollEventLoop.processReady(EpollEventLoop.java:394) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEven > > > >> tLoop.run( > > > >> >> EpollEventLoop.java:304) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > >> >> SingleThreadEventExecutor$5.run(SingleThreadEventExecutor. > java:858) > > > >> >> > > at > > > >> >> > org.apache.hbase.thirdparty.io.netty.util.concurrent. > > > >> >> DefaultThreadFactory$DefaultRunnableDecorator.run( > > > >> >> DefaultThreadFactory.java:138) > > > >> >> > > at java.lang.Thread.run(Thread.java:748) > > > >> >> > > Caused by: java.lang.NoSuchMethodException: > > > >> >> > org.apache.hadoop.hdfs.DFSClient. > decryptEncryptedDataEncryption > > > >> >> Key(org.apache.hadoop.fs.FileEncryptionInfo) > > > >> >> > > at java.lang.Class.getDeclaredMethod(Class.java:2130) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper. > > > >> >> createTransparentCryptoHelper(FanOutOneBlockAsyncDFSOutputSa > > > >> >> slHelper.java:232) > > > >> >> > > at org.apache.hadoop.hbase.io > > > >> >> > .asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.<clinit>( > > > >> >> FanOutOneBlockAsyncDFSOutputSaslHelper.java:262) > > > >> >> > > ... 18 more > > > >> >> > > > > > >> >> > > FYI, we don't have encryption enabled. Let me know if you > need > > > >> more > > > >> >> info > > > >> >> > > about our setup. > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > > > > > > > > > > >
