[ https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-6788 stopped by Dan Hecht. ----------------------------------------- > Query fragments can spend lots of time starting up then fail right after > "starting" all backends > ------------------------------------------------------------------------------------------------ > > Key: IMPALA-6788 > URL: https://issues.apache.org/jira/browse/IMPALA-6788 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec > Affects Versions: Impala 2.12.0 > Reporter: Mostafa Mokhtar > Assignee: Dan Hecht > Priority: Major > Labels: krpc, rpc > Attachments: connect_thread_busy_queries_failing.txt, > impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip > > > Logs from a large cluster show that query startup can take a long time, then > once the startup completes the query is cancelled, this is because one of the > intermediate rpcs failed. > Not clear what the right answer is as fragments are started asynchronously, > possibly a timeout? > {code} > I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() > query_id=334cc7dd9758c36c:ec38aeb400000000 stmt=with customer_total_return as > I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 > backends for query_id=334cc7dd9758c36c:ec38aeb400000000 > I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 > backends for query_id=334cc7dd9758c36c:ec38aeb400000000 > I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() > query_id=334cc7dd9758c36c:ec38aeb400000000 > I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() > query_id=334cc7dd9758c36c:ec38aeb400000000, tried to cancel 643 backends > I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control > resources for query_id=334cc7dd9758c36c:ec38aeb400000000 > {code} > {code} > I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() > query_id=e44d553b04d47cfb:28f06bb800000000 stmt=with customer_total_return as > I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 > backends for query_id=e44d553b04d47cfb:28f06bb800000000 > I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 > backends for query_id=e44d553b04d47cfb:28f06bb800000000 > I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() > query_id=e44d553b04d47cfb:28f06bb800000000 > I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() > query_id=e44d553b04d47cfb:28f06bb800000000, tried to cancel 639 backends > I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control > resources for query_id=e44d553b04d47cfb:28f06bb800000000 > {code} > Checked the coordinator and threads appear to be spending lots of time > waiting on exec_complete_barrier_ > {code} > #0 0x00007fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000001222944 in impala::Promise<bool>::Get() () > #2 0x0000000001220d7b in impala::Coordinator::StartBackendExec() () > #3 0x0000000001221c87 in impala::Coordinator::Exec() () > #4 0x0000000000c3a925 in > impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest > const&) () > #5 0x0000000000c41f7e in > impala::ClientRequestState::Exec(impala::TExecRequest*) () > #6 0x0000000000bff597 in > impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, > std::shared_ptr<impala::ImpalaServer::SessionState>, bool*, > std::shared_ptr<impala::ClientRequestState>*) () > #7 0x0000000000c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, > std::shared_ptr<impala::ImpalaServer::SessionState>, > std::shared_ptr<impala::ClientRequestState>*) () > #8 0x0000000000c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, > beeswax::Query const&) () > /StartBackendExec > #11 0x0000000000d60c9a in boost::detail::thread_data<boost::_bi::bind_t<void, > void (*)(std::string const&, std::string const&, boost::function<void ()>, > impala::ThreadDebugInfo const*, impala::Promise<long>*), > boost::_bi::list5<boost::_bi::value<std::string>, > boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long>*> > > >::run() () > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org