Shortly before the 'All datanodes are bad' error, I saw:

2016-03-09 11:57:20,126 INFO  [LeaseRenewer:hbase@aws-hbase-staging]
retry.RetryInvocationHandler: Exception while invoking renewLease of
class ClientNamenodeProtocolTranslatorPB over
staging-aws-hbase-2.aws.px/10.231.16.197:8020. Trying to fail over
immediately.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category WRITE is not supported in state standby
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1774)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)

Looks like your hdfs was transitioning.

FYI


On Wed, Mar 9, 2016 at 7:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> From the region server log, did you notice this:
>
> 2016-03-09 11:59:49,159 WARN  
> [regionserver/staging-aws-hbase-data-1.aws.px/10.231.16.30:16020] 
> wal.ProtobufLogWriter: Failed to write trailer, non-fatal, continuing...
> java.io.IOException: All datanodes 
> DatanodeInfoWithStorage[10.231.16.30:50010,DS-a49fd123-fefc-46e2-83ca-6ae462081702,DISK]
>  are bad. Aborting...
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1084)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
>
>
> Was hdfs stable around that time ?
>
> I am more interested in the failure of fresh 1.2.0 installation.
>
> Please pastebin server logs for that incident.
>
>
> On Wed, Mar 9, 2016 at 6:19 AM, Michal Medvecky <medve...@pexe.so> wrote:
>
>> You can check both logs at
>>
>> http://michal.medvecky.net/log-master.txt
>> http://michal.medvecky.net/log-rs.txt
>>
>> First restart after upgrade happened at 10:29.
>>
>> I did not find anything useful.
>>
>> Michal
>>
>> On Wed, Mar 9, 2016 at 2:58 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > Can you take a look at data-1 server log around this time frame to see
>> > what happened ?
>> >
>> > Thanks
>> >
>> > > On Mar 9, 2016, at 3:44 AM, Michal Medvecky <medve...@pexe.so> wrote:
>> > >
>> > > Hello,
>> > >
>> > > I upgraded my single hbase master and single hbase regionserver from
>> > 1.1.3
>> > > to 1.2.0, by simply stopping both, upgrading packages (i download
>> binary
>> > > packages from hbase.org) and starting them again.
>> > >
>> > > It did not come up; master is stuck in assigning hbase:meta:
>> > >
>> > > 2016-03-09 11:28:11,491 INFO
>> > > [staging-aws-hbase-3:16000.activeMasterManager]
>> master.AssignmentManager:
>> > > Processing 1588230740 in state: M_ZK_REGION_OFFLINE
>> > > 2016-03-09 11:28:11,491 INFO
>> > > [staging-aws-hbase-3:16000.activeMasterManager] master.RegionStates:
>> > > Transition {1588230740 state=OFFLINE, ts=1457522891475, server=null}
>> to
>> > > {1588230740 state=OFFLINE, ts=1457522891491,
>> > > server=staging-aws-hbase-data-1.aws.px,16020,1457522034036}
>> > > 2016-03-09 11:28:11,492 INFO
>> > > [staging-aws-hbase-3:16000.activeMasterManager]
>> > > zookeeper.MetaTableLocator: Setting hbase:meta region location in
>> > ZooKeeper
>> > > as staging-aws-hbase-data-1.aws.px,16020,1457522034036
>> > > 2016-03-09 11:42:10,611 ERROR [Thread-65] master.HMaster: Master
>> failed
>> > to
>> > > complete initialization after 900000ms. Please consider submitting a
>> bug
>> > > report including a thread dump of this process.
>> > >
>> > > Did I miss something?
>> > >
>> > > I tried downgrading back to 1.1.3 but later realized this is not
>> > supported
>> > > (and does not work of course).
>> > >
>> > > Thank you
>> > >
>> > > Michal
>> >
>>
>
>

Reply via email to