[ https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Khurram Faraaz closed DRILL-4595. --------------------------------- Resolution: Fixed > FragmentExecutor.fail() should interrupt the fragment thread to avoid > possible query hangs > ------------------------------------------------------------------------------------------ > > Key: DRILL-4595 > URL: https://issues.apache.org/jira/browse/DRILL-4595 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.4.0 > Reporter: Deneche A. Hakim > Assignee: Deneche A. Hakim > Fix For: Future > > > When a fragment fails it's assumed it will be able to close itself and send > it's FAILED state to the foreman which will cancel any running fragments. > FragmentExecutor.cancel() will interrupt the thread making sure those > fragment don't stay blocked. > However, if a fragment is already blocked when it's fail method is called the > foreman may never be notified about this and the query will hang forever. One > such scenario is the following: > - generally it's a CTAS running on a large cluster (lot's of writers running > in parallel) > - logs show that the user channel was closed and UserServer caused the root > fragment to move to a FAILED state > - jstack shows that the root fragment is blocked in it's receiver waiting for > data > - jstack also shows that ALL other fragments are no longer running, and the > logs show that all of them succeeded > - the foreman waits *forever* for the root fragment to finish -- This message was sent by Atlassian JIRA (v6.4.14#64029)