[ https://issues.apache.org/jira/browse/IMPALA-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-9113. ---------------------------------- Resolution: Not A Problem Yup, you are right, the lock is released in {{ClientRequestState::FetchRowsInternal}}. There is even a comment in the code for this: [https://github.com/apache/impala/blob/master/be/src/service/client-request-state.cc#L1000] > Queries can hang if an impalad is killed after a query has FINISHED > ------------------------------------------------------------------- > > Key: IMPALA-9113 > URL: https://issues.apache.org/jira/browse/IMPALA-9113 > Project: IMPALA > Issue Type: Bug > Components: Backend, Clients > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > > There is a race condition in the query coordination code that could cause > queries to hang indefinitely in an un-cancellable state if an impalad crashes > after the query has transitioned to the FINISHED state, but before all > backends have completed. > The issue occurs if: > * A query produces all results > * A client issues a fetch request to read all of those results > * The client fetch request fetches all available rows (e.g. eos is hit) > * {{Coordinator::GetNext}} then calls > {{SetNonErrorTerminalState(ExecState::RETURNED_RESULTS)}} which eventually > calls {{WaitForBackends()}} > * {{WaitForBackends()}} will block until all backends have completed > * One of the impalads running the query crashes, and thus never reports > success for the query fragment it was running > * The {{WaitForBackends()}} call will then block indefinitely > * Any attempt to cancel the query fails because the original fetch request > that drove the {{WaitForBackends()}} call has acquired the > {{ClientRequestState}} lock, which thus prevents any cancellation from > occurring. > Implementing IMPALA-6984 should theoretically fix this because as soon as eos > is hit, the coordinator will call {{CancelBackends()}} rather than > {{WaitForBackends()}}. Another solution would be to add a timeout to the > {{WaitForBackends()}} so that it returns after the timeout is hit, this would > force the fetch request to return 0 rows with {{hasMoreRows=true}}, and > unblock any cancellation threads. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org