Update on this:

deleting the contents the /hbase-unsecure/region-in-transition node did fix my 
problem with
HBase finding my table regions.

I'm still have a problem though, possibly related. I’m seeing OutOfMemory 
errors in the region server logs (modified slightly):

2015-12-23 06:52:37,466 INFO  [RS_LOG_REPLAY_OPS-p7:60020-0] 
handler.HLogSplitterHandler: worker p7.foo.net,60020,1450871487168 done with 
task 
/hbase-unsecure/splitWAL/WALs%2Fp15.foo.net%2C60020%2C1450535337455-splitting%2Fp15.foo.net%252C60020%252C1450535337455.1450535339318
 in 68348ms
2015-12-23 06:52:37,466 ERROR [RS_LOG_REPLAY_OPS-p7:60020-0] 
executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:713)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1360)
        at 
java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.close(HLogSplitter.java:1121)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(HLogSplitter.java:1086)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:360)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:220)
        at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:143)
        at 
org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:82)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

The region servers are configured with an 8G heap. I initially thought this 
might be a ulimit problem, so I bumped the
open file limit to about 10K and the process limit  up to 2048, but that did 
not seem to matter. What other parameters
might be causing an OOM error?

Thanks
Brian

> On Dec 22, 2015, at 12:46 PM, Brian Jeltema <bdjelt...@gmail.com> wrote:
> 
>> 
>> You should really find out where you hmaster ui lives (there is a master UI
>> for every node provided by the apache project) because it gives you
>> information on the state of your system,
> 
> I’m familiar with the HMaster UI. I’m looking at it now. It does not contain
> the information you describe. There is a list of region servers and an
> a menu bar that contains: Home    Table Details    Local Logs   Degug Dump    
> Metrics Dump    HBase Configuration
> 
> If I click on the Table Details item, I get a list of the tables. If I click 
> on a table, there is a Tasks section that says
> No tasks currently runining on this node.
> 
> The region server logs do not contain any records relating to RITs, or really 
> even regions.
> The master UI does not contain any information about RITs
> Version:  HDP 2.2 -> HBase 0.98.4
> 
> The zookeeper node /hbase-unsecure/regions-in-transition contains a long list 
> of items
> that are not removed when I restart the service. I think this is a 
> side-effect of problems
> I had when I did the HDP 2.1 -> HDP 2.2 upgrade, which did not go well. 
> 
> I would like to remove or clear the /hbase-unsecure/region-in-transition node
> as an experiment. I’m just looking for guidance on whether that is a safe 
> thing to do.
> 
> Brian
> 
>> but if you want to skip all that,
>> here are the instructions for OfflineRepair, without knowing what is
>> happening with your system (logs, master ui info) you can try this but at
>> your own risk.
>> 
>> OfflineMetaRepair.
>> Description Below:
>> This code is used to rebuild meta off line from file system data. If there
>>  * are any problem detected, it will fail suggesting actions for the user
>> to do
>>  * to "fix" problems. If it succeeds, it will backup the previous
>> hbase:meta and
>>  * -ROOT- dirs and write new tables in place.
>> 
>> Stop HBase
>> zookeeper-client rmr /hbase
>> HADOOP_USER_NAME=hbase hbase
>> org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
>> start hbase
>> 
>> ^ This has worked for me in some situations where I understood HDFS and
>> Zookeeper disagreed on region locations, but keep in mind I have tried this
>> on hbase 1.0.0 and your mileage may vary.
>> 
>> We don't have your hbase version (you can even find this on the hbase shell)
>> We don't have logs msgs
>> We don't have master's view of your RITs
>> 
>> 
>> On Tue, Dec 22, 2015 at 11:52 AM, Brian Jeltema <bdjelt...@gmail.com> wrote:
>> 
>>> I’m running Ambari 2.0.2 and HPD 2.2. I don’t see any of this displayed at
>>> master:60010.
>>> 
>>> I really think this problem is the result of cruft in ZooKeeper. Does
>>> anybody know
>>> if it’s safe to delete the node?
>>> 
>>> 
>>>> On Dec 22, 2015, at 11:40 AM, Geovanie Marquez <
>>> geovanie.marq...@gmail.com> wrote:
>>>> 
>>>> check hmaster:60010 under TASKS (between Software Attributes and Tables)
>>>> you will see if you have regions in transition. This will tell you which
>>>> regions are transitioning and you can go to those region server logs and
>>>> check them, I've run into a couple of these and every time they've talk
>>> to
>>>> me about their problem.
>>>> 
>>>> Also, under Software Attributes you can check the HBase version.
>>>> 
>>>> On Tue, Dec 22, 2015 at 11:29 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>> 
>>>>> From RegionListTmpl.jamon :
>>>>> 
>>>>> <%if (onlineRegions != null && onlineRegions.size() > 0) %>
>>>>> ...
>>>>> <%else>
>>>>>  <p>Not serving regions</p>
>>>>> </%if>
>>>>> 
>>>>> The message means that there was no region online on the underlying
>>> server.
>>>>> 
>>>>> FYI
>>>>> 
>>>>> On Tue, Dec 22, 2015 at 7:18 AM, Brian Jeltema <bdjelt...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Following up, if I look at the MBase Master UI in the Ambari console I
>>>>> see
>>>>>> links to
>>>>>> all of the region servers. If I click on those links, the Region Server
>>>>>> page comes
>>>>>> up and in the Regions section, is displays ‘Not serving regions’. I’m
>>> not
>>>>>> sure
>>>>>> if that means something is disabled, or it just doesn’t have any
>>> regions
>>>>>> to server.
>>>>>> 
>>>>>>> On Dec 22, 2015, at 6:19 AM, Brian Jeltema <bdjelt...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Can you pick a few regions stuck in transition and check related
>>>>> region
>>>>>>>> server logs to see why they couldn't be assigned ?
>>>>>>> 
>>>>>>> I don’t see anything in the region logs relating any regions.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Which release were you using previously ?
>>>>>>> 
>>>>>>> HDP 2.1 -> HDP 2.2
>>>>>>> 
>>>>>>> So is it safe to stop HBase and delete the ZK node?
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> On Mon, Dec 21, 2015 at 3:54 PM, Brian Jeltema <bdjelt...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I am doing a cluster upgrade to the HDP 2.2 stack. For some reason,
>>>>>> after
>>>>>>>>> the upgrade HBase
>>>>>>>>> cannot find any regions for existing tables. I believe the HDFS file
>>>>>>>>> system is OK. But looking at the ZooKeeper
>>>>>>>>> nodes, I noticed that many (maybe all) of the regions were listed in
>>>>>> the
>>>>>>>>> ZooKeeper
>>>>>>>>> /hbase-unsecure/region-in-transition node. I suspect this could be
>>>>>> causing
>>>>>>>>> a problem. Is it
>>>>>>>>> safe to stop HBase and delete that node?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Brian
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
> 
> 

Reply via email to