[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715186#comment-15715186 ] Stephan Ewen commented on FLINK-4632: - Is this something that needs re-tries to enhance stability? To

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715047#comment-15715047 ] 刘喆 commented on FLINK-4632: --- Thank you Ufuk Celebi. It depends on the strategy of restarting. Here I use auto

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715048#comment-15715048 ] 刘喆 commented on FLINK-4632: --- Thank you Ufuk Celebi. It depends on the strategy of restarting. Here I use auto

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread Robert Metzger (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714931#comment-15714931 ] Robert Metzger commented on FLINK-4632: --- If its okay for everybody, I'm setting the priority of this

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread Ufuk Celebi (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714794#comment-15714794 ] Ufuk Celebi commented on FLINK-4632: Thanks for pinging me. [~liuzhe] The error means that the TCP

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread Maximilian Michels (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714712#comment-15714712 ] Maximilian Michels commented on FLINK-4632: --- Thanks for the logs! I adjusted the formatting.

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-12-02 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714684#comment-15714684 ] 刘喆 commented on FLINK-4632: --- It appears again. This time I get the logs. On JobManager: 2016-12-02 16:27:17,275

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-10-12 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569490#comment-15569490 ] Stephan Ewen commented on FLINK-4632: - Do you have the logs of the JobManager from when the

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-28 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528833#comment-15528833 ] 刘喆 commented on FLINK-4632: --- I think it is related to checkpoint. When I use checkpoint with 'exactly_once'

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-28 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528781#comment-15528781 ] 刘喆 commented on FLINK-4632: --- I can't reproduce it now. I only save the TaskManager' log as the beginning. If it

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-26 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522404#comment-15522404 ] Stephan Ewen commented on FLINK-4632: - I have tried to reproduce this, but it works well on my

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-26 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522303#comment-15522303 ] 刘喆 commented on FLINK-4632: --- I tried the lastest github version, the problem is still there. When the job is

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-23 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516956#comment-15516956 ] Stephan Ewen commented on FLINK-4632: - Okay, that should have a fix in the next days. Would you be

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-23 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516530#comment-15516530 ] 刘喆 commented on FLINK-4632: --- It is (2). I make a new idle hadoop yarn cluster, and use the source from github

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-21 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509469#comment-15509469 ] Stephan Ewen commented on FLINK-4632: - Do you know which of the following two situations is the case?

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-20 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508441#comment-15508441 ] 刘喆 commented on FLINK-4632: --- The container is killed by two reasons: 1, yarn preemption 2, yarn nodemanager

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-19 Thread JIRA
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15505243#comment-15505243 ] 刘喆 commented on FLINK-4632: --- Yes. The job's status is canceling, then hung. Web page is ok, client process is

[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung

2016-09-19 Thread Stephan Ewen (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502924#comment-15502924 ] Stephan Ewen commented on FLINK-4632: - It may happen that a TaskManager is lost / killed. I would