Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread vino yang
Hi Averell, I have not used aws products, but if it is similar to YARN, or if you have visited YARN's web ui. Then you look at the YARN ApplicationMaster log to view the JM log, and the container log is the tm log. Thanks, vino. Averell 于2018年8月27日周一 下午4:09写道: > Hi Vino, > > Could you please

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell
Hi Vino, Could you please tell where I should find the JM and TM logs? I'm running on an AWS EMR using yarn. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread vino yang
Hi Averell, This problem is caused by a heartbeat timeout between JM and TM. You can locate it by: 1) Check the network status of the node at the time, such as whether the connection with other systems is equally problematic; 2) Check the tm log to see if there are more specific reasons; 3) View

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread Averell
Thank you Vino. I put the message in a tag, and I don't know why it was not shown in the email thread. I paste the error message below in this email. Anyway, it seems that was an issue with enabling checkpointing. Now I am able to get it turned on properly, and my job is getting restored

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-27 Thread vino yang
Hi Averell, What is the error message? Do you seem to forget to post it? As far as I know, if you enable checkpoints, it will automatically resume when the job fails. Thanks, vino. Averell 于2018年8月27日周一 下午1:21写道: > Thank you Vino. > > I sometimes got the error message like the one below. It

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-26 Thread Averell
Thank you Vino. I sometimes got the error message like the one below. It looks like my executors got overloaded. Here I have another question: is there any existing solution that allows me to have the job restored automatically? Thanks and best regards, Averell -- Sent from:

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread vino yang
Hi Averell, The checkpoint is automatically triggered periodically according to the checkpoint interval set by the user. I believe that you should have no doubt about this. There are many reasons for the Job failure. The technical definition is that the Job does not normally enter the final

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Averell
Hi Vino, Regarding this statement "/Checkpoints are taken automatically and are used for automatic restarting job in case of a failure/", I do not quite understand the definition of a failure, and how to simulate that while testing my application. Possible scenarios that I can think of: (1)

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Andrey Zagrebin
This thread is also useful in this context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/difference-between-checkpoints-amp-savepoints-td14787.html

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread Andrey Zagrebin
Hi Henry, In addition to Vino’s answer, there are several things to keep in mind about “checkpoints vs savepoints". Checkpoints are designed mostly for fault tolerance of running Flink job and automatic recovery that is why by default Flink manages their storage itself. Though it is correct

Re: Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread vino yang
Hi Henry, A good answer from stackoverflow: Apache Flink's Checkpoints and Savepoints are similar in that way they both are mechanisms for preserving internal state of Flink's applications. Checkpoints are taken automatically and are used for automatic restarting job in case of a failure.

Can I only use checkpoints instead of savepoints in production?

2018-08-24 Thread 徐涛
Hi All, I check the documentation of Flink release 1.6, find that I can use checkpoints to resume the program either. As I encountered some problems when using savepoints, I have the following questions: 1. Can I use checkpoints only, but not use savepoints, because it can also use to resume