Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-17 Thread Piotr Nowojski
Hi Thomas. The bug https://issues.apache.org/jira/browse/FLINK-21028 is still present in 1.12.1. You would need to upgrade to at least 1.13.0, 1.12.2 or 1.11.4. However as I mentioned before, 1.11.4 hasn't yet been released. On the other hand both 1.12.2 and 1.13.0 have already been superseded by

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-15 Thread Thomas Wang
Thanks everyone. I'm using Flink on EMR. I just updated to EMR 6.3 which uses Flink 1.12.1. I will report back whether this resolves the issue. Thomas On Wed, Jun 9, 2021 at 11:15 PM Yun Gao wrote: > Very thanks Kezhu for the catch, it also looks to me the same issue as > FLINK-21028. > >

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-10 Thread Yun Gao
Very thanks Kezhu for the catch, it also looks to me the same issue as FLINK-21028. -- From:Piotr Nowojski Send Time:2021 Jun. 9 (Wed.) 22:12 To:Kezhu Wang Cc:Thomas Wang ; Yun Gao ; user Subject:Re: Re: Re: Re: Failed to

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-09 Thread Piotr Nowojski
Yes good catch Kezhu, IllegalStateException sounds very much like FLINK-21028. Thomas, could you try upgrading to Flink 1.13.1 or 1.12.4? (1.11.4 hasn't been released yet)? Piotrek wt., 8 cze 2021 o 17:18 Kezhu Wang napisał(a): > Could it be same as FLINK-21028[1] (titled as “Streaming

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Kezhu Wang
Could it be same as FLINK-21028[1] (titled as “Streaming application didn’t stop properly”, fixed in 1.11.4, 1.12.2, 1.13.0) ? [1]: https://issues.apache.org/jira/browse/FLINK-21028 Best, Kezhu Wang On June 8, 2021 at 22:54:10, Yun Gao (yungao...@aliyun.com) wrote: Hi Thomas, I tried but do

Re: Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-08 Thread Yun Gao
Hi Thomas, I tried but do not re-produce the exception yet. I have filed an issue for the exception first [1]. [1] https://issues.apache.org/jira/browse/FLINK-22928 --Original Mail -- Sender:Thomas Wang Send Date:Tue Jun 8 07:45:52 2021 Recipients:Yun Gao

Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-07 Thread Thomas Wang
This is actually a very simple job that reads from Kafka and writes to S3 using the StreamingFileSink w/ Parquet format. I'm all using Flink's API and nothing custom. Thomas On Sun, Jun 6, 2021 at 6:43 PM Yun Gao wrote: > Hi Thoms, > > Very thanks for reporting the exceptions, and it seems to

Re: Re: Re: Failed to cancel a job using the STOP rest API

2021-06-06 Thread Yun Gao
Hi Thoms, Very thanks for reporting the exceptions, and it seems to be not work as expected to me... Could you also show us the dag of the job ? And does some operators in the source task use multiple-threads to emit records? Best, Yun --Original Mail --

Re: Re: Failed to cancel a job using the STOP rest API

2021-06-05 Thread Thomas Wang
One thing I noticed is that if I set drain = true, the job could be stopped correctly. Maybe that's because I'm using a Parquet file sink which is a bulk-encoded format and only writes to disk during checkpoints? Thomas On Sat, Jun 5, 2021 at 10:06 AM Thomas Wang wrote: > Hi Yun, > > Thanks

Re: Re: Failed to cancel a job using the STOP rest API

2021-06-05 Thread Thomas Wang
Hi Yun, Thanks for the tips. Yes, I do see some exceptions as copied below. I'm not quite sure what they mean though. Any hints? Thanks. Thomas ``` 2021-06-05 10:02:51 java.util.concurrent.ExecutionException: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could

Re: Re: Failed to cancel a job using the STOP rest API

2021-06-05 Thread Yun Gao
Hi Thomas, For querying the savepoint status, a get request could be issued to /jobs/:jobid/savepoints/:savepointtriggerid [1] to get the status and position of the savepoint. But if the job is running with some kind of per-job mode and JobMaster is gone after the stop-with-savepoint, the

Re: Failed to cancel a job using the STOP rest API

2021-06-04 Thread Thomas Wang
Hi Yun, Thanks for your reply. We are not using any legacy source. For this specific job, there is only one source that is using FlinkKafkaConsumer which I assume has the correct cancel() method implemented. Also could you suggest how I could use the "request-id" to get the savepoint location?

Re: Failed to cancel a job using the STOP rest API

2021-06-04 Thread Yun Gao
Hi Thomas, I think you are right that the CLI is also using the same rest API underlying, and since the response of the rest API is ok and the savepoint is triggered successfully, I reckon that it might not be due to rest API process, and we might still first focus on the stop-with-savepoint

Failed to cancel a job using the STOP rest API

2021-06-03 Thread Thomas Wang
Hi, Flink community, I'm trying to use the STOP rest API to cancel a job. So far, I'm seeing some inconsistent results. Sometimes, jobs could be cancelled successfully while other times, they couldn't. Either way, the POST request is accepted with a status code 202 and a "request-id". >From the