Ralph, one thing you could try is to disable the Phoenix side altogether (disable the configs for coprocessors). Then restart hbase. That should help to bring the regions online (assuming that the Phoenix coprocessor invocations are causing the servers to go down).
________________________________ From: Perko, Ralph J <[email protected]> Sent: Wednesday, April 08, 2015 9:04 AM To: [email protected] Subject: Re: hbase / phoenix errors Hi – thanks everyone for the help. I could use some guidance as my system is currently not usable. I see how the bug impacted the system and I’m glad it showed up now. But how do I move forward? Options I see: Apply the patches from issue #1634 to phoenix 4.3 Downgrade phoenix to 4.2.2 Something else you may suggest? Regarding hbase – Is there any recovery from the state its in (see previous messages)? Thanks, Ralph From: <Perko>, Ralph Perko <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, April 7, 2015 at 1:50 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: RE: hbase / phoenix errors That is unfortunate. Do you know if there any way to recover the data? I’ve tried the following: I ran “hbase hbck” and learned all the regions are inconsistent and have holes to repair. I attempted to run “hbase hbck –repairHoles” and got stuck in a loop with a message that a region is still in transition. Based on something I read on the hbase mailing list I tried clearing out the zk node and repeat the repair but none of this has worked. From: Samarth Jain [mailto:[email protected]] Sent: Tuesday, April 07, 2015 1:32 PM To: [email protected]<mailto:[email protected]> Subject: Re: hbase / phoenix errors I think that page needs to be updated. Sorry about that, Ralph. We ran into problems with HBase 0.98.4 and local indexes where a similar (but not the same) error was thrown: Coprocessor.CoprocessorHost: the coprocessor …LocalIndexSplitter threw an exception NoSuchMethodError hbase.regionserver.RegionServerService.getCatalogTracker See https://issues.apache.org/jira/browse/PHOENIX-1634. Rajeshbabu - would be interesting to get your opinion on this too. On Tue, Apr 7, 2015 at 1:19 PM, Perko, Ralph J <[email protected]<mailto:[email protected]>> wrote: Based on the Phoenix compatibility chart at the download page I did not expect there to be issues with Phoenix 4.3 and Hbase 0.98.4. http://phoenix.apache.org/download.html From: Devaraj Das [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, April 07, 2015 12:58 PM To: [email protected]<mailto:[email protected]> Subject: Re: hbase / phoenix errors What is the major driver to not use the HDP bundled Phoenix? It seems to me that the Phoenix version you have is not compatible with the underlying HBase version, leading to all these issues. In particular, the method getCatalogTracker in HDP-2.2 works only with 1 argument, but in Phoenix versions from the open source, it works with 0 arguments. This has been taken care of in the-yet-to-be-released HDP-2.3 (the HBase/Phoenix code both supports/uses the 0 argument getCatalogTracker). ________________________________ From: Perko, Ralph J <[email protected]<mailto:[email protected]>> Sent: Tuesday, April 07, 2015 10:28 AM To: [email protected]<mailto:[email protected]> Subject: RE: hbase / phoenix errors Thank you for the response I am using Phoenix 4.3 as a separate installation. Unfortunately I have no way to copy the actual log files so I will need to transcribe as much as I can. There are a lot of things going on – I’ll try to provide the highlights Right now: Using ambari – everything on the cluster is green – there are no apparent issues (but there are many) On the hbase master web site it shows a table split hung up (all red – “regions in transition”) since yesterday evening. All my phoenix tables are setup as follows: Salted 100GB hregion max file size Constant split size policy If I attempt to connect to Phoenix using sqlline it get the exception: NotServingRegionException:Region SYSTEM.CATALOG is not online If I run hbase shell I can list the tables but cannot scan any of them RS Log Messages: Aside from the messages I provided earlier some errors and exceptions have come up as well on the RS: In order I believe: ERROR StatsScanner failed to update stats table ERROR largeCompaction Compaction Failed ERROR largeCompaction Failed after attempt 350 – ConnectionRefused – this server is in the failed servers list Coprocessor.CoprocessorHost: the coprocessor …LocalIndexSplitter threw an exception NoSuchMethodError hbase.regionserver.RegionServerService.getCatalogTracker HRegion: compaction interrupted InterruptedOException RuntimeException: HRegionServer aborted Restart ERROR RS_LOG_REPLAY wal.HLogSplitter OutOFMemory Restart Many of these: RemoteException (LeaseExpiredException) Holder: DFSCLient…recovered.edits…: File does not exist Many java.net.ConnectionException: Connection refused Java.net.ConnectionException SocketTimeoutException … row ‘’ on table ‘hbase.meta’ This is where we are today I will provide whatever info you need Thanks! Ralph From: Nick Dimiduk [mailto:[email protected]] Sent: Tuesday, April 07, 2015 9:05 AM To: [email protected]<mailto:[email protected]> Subject: Re: hbase / phoenix errors Also, beside each region server log file (.log) there's also the output file (.out). Check the output files as well, as some serious crashes scenarios bypass the logs and go directly to the out files. -n On Tuesday, April 7, 2015, Devaraj Das <[email protected]<mailto:[email protected]>> wrote: Hi Ralph, were you using the Phoenix bundled with HDP-2.2 or was that a separate installation? Could you please copy/paste some log lines around the time of a regionserver's crash (look for exceptions etc around that time in the regionserver logs). Thanks Devaraj On Apr 6, 2015, at 3:00 PM, Perko, Ralph J <[email protected]<mailto:[email protected]>> wrote: Hi, we recently upgraded to Phoenix 4.3 and Hortonworks 2.2 (HBase .98.4) and we are running into some issues. I am wondering if I am missing something easy and hoping you can help. I have 34 regions servers and many keep crashing but without much in the way of error messages. Here are the things that stand out: ClientAsync.Process – waiting for some tasks to finish smallCompaction RPCRetryingCaller: Call exception …. ‘msg row ‘SOME_PHOENIX_TABLE_NAME_IDX:<some long key>’ on table: SYSTEM.STATS attempt 225/350 Similar ones for largeCompaction as well. The other issue is the Pig loader hangs with these messages in the mapper logs: [phoenix-1-thread-0] RPCRetryingCaller: Call exception msg row ‘’ on table ‘SYSTEM.CATALOG’ Eventually the mappers time out – no errors Regions servers come up and down. There are lots of connection refused errors as well. Restarting hbase does not help. The region servers will come up then go down again. Zookeeper is up. I’ve restarted just in case but it did not help I cannot connect to Phoenix from the command line Any help is appreciated. Thanks! Ralph
