[ https://issues.apache.org/jira/browse/KUDU-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Berkeley updated KUDU-2664: -------------------------------- Description: While trying to reproduce a different issue, I ran the following command {noformat} for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 454b53ed77bd458a81a7710c892f214b; done {noformat} and encountered the following tablet server crash {noformat} F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b *** Check failure stack trace: *** @ 0x10c91247f google::LogMessageFatal::~LogMessageFatal() @ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal() @ 0x108b74c05 kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked() @ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica() @ 0x108b6b459 kudu::consensus::RaftConsensus::Update() @ 0x107cf5106 kudu::tserver::ConsensusServiceImpl::UpdateConsensus() @ 0x10b53b87d kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()() @ 0x10b53b819 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ @ 0x10b53b6a9 std::__1::__function::__func<>::operator()() @ 0x10b843e07 std::__1::function<>::operator()() @ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle() @ 0x10b846cb6 kudu::rpc::ServicePool::RunThread() @ 0x10b849aa9 boost::_mfi::mf0<>::operator()() @ 0x10b849a10 boost::_bi::list1<>::operator()<>() @ 0x10b8499ba boost::_bi::bind_t<>::operator()() @ 0x10b84979d boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x10b7bb1fa boost::function0<>::operator()() @ 0x10c2cc2f5 kudu::Thread::SuperviseThread() @ 0x7fff5dc09305 _pthread_body @ 0x7fff5dc0c26f _pthread_start @ 0x7fff5dc08415 thread_start {noformat} The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at address 127.0.0.1:7250, and I was trying to kick out one of the three replicas while fishing for a repro of the other issue. I couldn't get the crash to happen again and I wasn't able to capture a minidump or core dump...and I accidentally deleted the logs, so I'm afraid the above is all there is to go on. It's expected that funny stuff could happen when using unsafe_change_config-- it's unsafe. But it shouldn't be possible to crash the tablet server with it. was: While trying to reproduce a different issue, I ran the following command {noformat} for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 3ccbce6a3116487cbcc79ab4280a2ee5 {noformat} and encountered the following tablet server crash {noformat} F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b *** Check failure stack trace: *** @ 0x10c91247f google::LogMessageFatal::~LogMessageFatal() @ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal() @ 0x108b74c05 kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked() @ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica() @ 0x108b6b459 kudu::consensus::RaftConsensus::Update() @ 0x107cf5106 kudu::tserver::ConsensusServiceImpl::UpdateConsensus() @ 0x10b53b87d kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()() @ 0x10b53b819 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ @ 0x10b53b6a9 std::__1::__function::__func<>::operator()() @ 0x10b843e07 std::__1::function<>::operator()() @ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle() @ 0x10b846cb6 kudu::rpc::ServicePool::RunThread() @ 0x10b849aa9 boost::_mfi::mf0<>::operator()() @ 0x10b849a10 boost::_bi::list1<>::operator()<>() @ 0x10b8499ba boost::_bi::bind_t<>::operator()() @ 0x10b84979d boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x10b7bb1fa boost::function0<>::operator()() @ 0x10c2cc2f5 kudu::Thread::SuperviseThread() @ 0x7fff5dc09305 _pthread_body @ 0x7fff5dc0c26f _pthread_start @ 0x7fff5dc08415 thread_start {noformat} The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at address 127.0.0.1:7250, and I was trying to kick out one of the three replicas while fishing for a repro of the other issue. I couldn't get the crash to happen again and I wasn't able to capture a minidump or core dump...and I accidentally deleted the logs, so I'm afraid the above is all there is to go on. It's expected that funny stuff could happen when using unsafe_change_config-- it's unsafe. But it shouldn't be possible to crash the tablet server with it. > Tablet server crashed when running kudu remote_replica unsafe_change > -------------------------------------------------------------------- > > Key: KUDU-2664 > URL: https://issues.apache.org/jira/browse/KUDU-2664 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Will Berkeley > Priority: Major > > While trying to reproduce a different issue, I ran the following command > {noformat} > for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 > 3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 > 454b53ed77bd458a81a7710c892f214b; done > {noformat} > and encountered the following tablet server crash > {noformat} > F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T > 3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 > FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: > kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b > *** Check failure stack trace: *** > @ 0x10c91247f google::LogMessageFatal::~LogMessageFatal() > @ 0x10c90f259 google::LogMessageFatal::~LogMessageFatal() > @ 0x108b74c05 > kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked() > @ 0x108b6c180 kudu::consensus::RaftConsensus::UpdateReplica() > @ 0x108b6b459 kudu::consensus::RaftConsensus::Update() > @ 0x107cf5106 > kudu::tserver::ConsensusServiceImpl::UpdateConsensus() > @ 0x10b53b87d > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()() > @ 0x10b53b819 > _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextEEEEvDpOT_ > @ 0x10b53b6a9 std::__1::__function::__func<>::operator()() > @ 0x10b843e07 std::__1::function<>::operator()() > @ 0x10b843a1a kudu::rpc::GeneratedServiceIf::Handle() > @ 0x10b846cb6 kudu::rpc::ServicePool::RunThread() > @ 0x10b849aa9 boost::_mfi::mf0<>::operator()() > @ 0x10b849a10 boost::_bi::list1<>::operator()<>() > @ 0x10b8499ba boost::_bi::bind_t<>::operator()() > @ 0x10b84979d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x10b7bb1fa boost::function0<>::operator()() > @ 0x10c2cc2f5 kudu::Thread::SuperviseThread() > @ 0x7fff5dc09305 _pthread_body > @ 0x7fff5dc0c26f _pthread_start > @ 0x7fff5dc08415 thread_start > {noformat} > The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at > address 127.0.0.1:7250, and I was trying to kick out one of the three > replicas while fishing for a repro of the other issue. > I couldn't get the crash to happen again and I wasn't able to capture a > minidump or core dump...and I accidentally deleted the logs, so I'm afraid > the above is all there is to go on. > It's expected that funny stuff could happen when using unsafe_change_config-- > it's unsafe. But it shouldn't be possible to crash the tablet server with it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)