[ 
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6788 stopped by Dan Hecht.
-----------------------------------------
> Query fragments can spend lots of time starting up then fail right after 
> "starting" all backends
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6788
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.12.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Dan Hecht
>            Priority: Major
>              Labels: krpc, rpc
>         Attachments: connect_thread_busy_queries_failing.txt, 
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then 
> once the startup completes the query is cancelled, this is because one of the 
> intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, 
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() 
> query_id=334cc7dd9758c36c:ec38aeb400000000 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 
> backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() 
> query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() 
> query_id=334cc7dd9758c36c:ec38aeb400000000, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control 
> resources for query_id=334cc7dd9758c36c:ec38aeb400000000
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() 
> query_id=e44d553b04d47cfb:28f06bb800000000 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 
> backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() 
> query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() 
> query_id=e44d553b04d47cfb:28f06bb800000000, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control 
> resources for query_id=e44d553b04d47cfb:28f06bb800000000
> {code}
> Checked the coordinator and threads appear to be spending lots of time 
> waiting on exec_complete_barrier_
> {code}
> #0  0x00007fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x0000000001222944 in impala::Promise<bool>::Get() ()
> #2  0x0000000001220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x0000000001221c87 in impala::Coordinator::Exec() ()
> #4  0x0000000000c3a925 in 
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest 
> const&) ()
> #5  0x0000000000c41f7e in 
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x0000000000bff597 in 
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, 
> std::shared_ptr<impala::ImpalaServer::SessionState>, bool*, 
> std::shared_ptr<impala::ClientRequestState>*) ()
> #7  0x0000000000c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, 
> std::shared_ptr<impala::ImpalaServer::SessionState>, 
> std::shared_ptr<impala::ClientRequestState>*) ()
> #8  0x0000000000c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, 
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x0000000000d60c9a in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::string const&, std::string const&, boost::function<void ()>, 
> impala::ThreadDebugInfo const*, impala::Promise<long>*), 
> boost::_bi::list5<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to