[jira] [Updated] (KUDU-2664) Tablet server crashed when running kudu remote_replica unsafe_change

Will Berkeley (JIRA) Fri, 18 Jan 2019 13:05:21 -0800


     [ 
https://issues.apache.org/jira/browse/KUDU-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Will Berkeley updated KUDU-2664:
--------------------------------
    Description: 
While trying to reproduce a different issue, I ran the following command

{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 
454b53ed77bd458a81a7710c892f214b; done
{noformat}

and encountered the following tablet server crash

{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
    @        0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
    @        0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
    @        0x108b74c05  
kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
    @        0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
    @        0x108b6b459  kudu::consensus::RaftConsensus::Update()
    @        0x107cf5106  kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
    @        0x10b53b87d  
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
    @        0x10b53b819  
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
    @        0x10b53b6a9  std::__1::__function::__func<>::operator()()
    @        0x10b843e07  std::__1::function<>::operator()()
    @        0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
    @        0x10b846cb6  kudu::rpc::ServicePool::RunThread()
    @        0x10b849aa9  boost::_mfi::mf0<>::operator()()
    @        0x10b849a10  boost::_bi::list1<>::operator()<>()
    @        0x10b8499ba  boost::_bi::bind_t<>::operator()()
    @        0x10b84979d  
boost::detail::function::void_function_obj_invoker0<>::invoke()
    @        0x10b7bb1fa  boost::function0<>::operator()()
    @        0x10c2cc2f5  kudu::Thread::SuperviseThread()
    @     0x7fff5dc09305  _pthread_body
    @     0x7fff5dc0c26f  _pthread_start
    @     0x7fff5dc08415  thread_start
{noformat}

The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at 
address 127.0.0.1:7250, and I was trying to kick out one of the three replicas 
while fishing for a repro of the other issue.

I couldn't get the crash to happen again and I wasn't able to capture a 
minidump or core dump...and I accidentally deleted the logs, so I'm afraid the 
above is all there is to go on.

It's expected that funny stuff could happen when using unsafe_change_config-- 
it's unsafe. But it shouldn't be possible to crash the tablet server with it.

  was:
While trying to reproduce a different issue, I ran the following command

{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
3ccbce6a3116487cbcc79ab4280a2ee5
{noformat}

and encountered the following tablet server crash

{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
    @        0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
    @        0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
    @        0x108b74c05  
kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
    @        0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
    @        0x108b6b459  kudu::consensus::RaftConsensus::Update()
    @        0x107cf5106  kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
    @        0x10b53b87d  
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
    @        0x10b53b819  
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
    @        0x10b53b6a9  std::__1::__function::__func<>::operator()()
    @        0x10b843e07  std::__1::function<>::operator()()
    @        0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
    @        0x10b846cb6  kudu::rpc::ServicePool::RunThread()
    @        0x10b849aa9  boost::_mfi::mf0<>::operator()()
    @        0x10b849a10  boost::_bi::list1<>::operator()<>()
    @        0x10b8499ba  boost::_bi::bind_t<>::operator()()
    @        0x10b84979d  
boost::detail::function::void_function_obj_invoker0<>::invoke()
    @        0x10b7bb1fa  boost::function0<>::operator()()
    @        0x10c2cc2f5  kudu::Thread::SuperviseThread()
    @     0x7fff5dc09305  _pthread_body
    @     0x7fff5dc0c26f  _pthread_start
    @     0x7fff5dc08415  thread_start
{noformat}

The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at 
address 127.0.0.1:7250, and I was trying to kick out one of the three replicas 
while fishing for a repro of the other issue.

I couldn't get the crash to happen again and I wasn't able to capture a 
minidump or core dump...and I accidentally deleted the logs, so I'm afraid the 
above is all there is to go on.

It's expected that funny stuff could happen when using unsafe_change_config-- 
it's unsafe. But it shouldn't be possible to crash the tablet server with it.


> Tablet server crashed when running kudu remote_replica unsafe_change
> --------------------------------------------------------------------
>
>                 Key: KUDU-2664
>                 URL: https://issues.apache.org/jira/browse/KUDU-2664
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Will Berkeley
>            Priority: Major
>
> While trying to reproduce a different issue, I ran the following command
> {noformat}
> for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
> 3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 
> 454b53ed77bd458a81a7710c892f214b; done
> {noformat}
> and encountered the following tablet server crash
> {noformat}
> F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
> 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
> FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
> kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
> *** Check failure stack trace: ***
>     @        0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
>     @        0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
>     @        0x108b74c05  
> kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
>     @        0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
>     @        0x108b6b459  kudu::consensus::RaftConsensus::Update()
>     @        0x107cf5106  
> kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
>     @        0x10b53b87d  
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
>     @        0x10b53b819  
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_
>     @        0x10b53b6a9  std::__1::__function::__func<>::operator()()
>     @        0x10b843e07  std::__1::function<>::operator()()
>     @        0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
>     @        0x10b846cb6  kudu::rpc::ServicePool::RunThread()
>     @        0x10b849aa9  boost::_mfi::mf0<>::operator()()
>     @        0x10b849a10  boost::_bi::list1<>::operator()<>()
>     @        0x10b8499ba  boost::_bi::bind_t<>::operator()()
>     @        0x10b84979d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>     @        0x10b7bb1fa  boost::function0<>::operator()()
>     @        0x10c2cc2f5  kudu::Thread::SuperviseThread()
>     @     0x7fff5dc09305  _pthread_body
>     @     0x7fff5dc0c26f  _pthread_start
>     @     0x7fff5dc08415  thread_start
> {noformat}
> The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at 
> address 127.0.0.1:7250, and I was trying to kick out one of the three 
> replicas while fishing for a repro of the other issue.
> I couldn't get the crash to happen again and I wasn't able to capture a 
> minidump or core dump...and I accidentally deleted the logs, so I'm afraid 
> the above is all there is to go on.
> It's expected that funny stuff could happen when using unsafe_change_config-- 
> it's unsafe. But it shouldn't be possible to crash the tablet server with it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KUDU-2664) Tablet server crashed when running kudu remote_replica unsafe_change

Reply via email to