Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vijay Bhaskar
Hi Vishal Save point with cancellation internally use /cancel REST API. Which is not stable API. It always exits with 404. Best way to issue is: a) First issue save point REST API b) Then issue /yarn-cancel rest API( As described in http://mail-archives.apache.org/mod_mbox/flink-user/201804

Could not resolve ResourceManager address on Flink 1.7.1

2019-03-12 Thread Le Xu
Hello: I am trying to set up a standalone flink cluster (1.7.1) and I'm getting a very similar error as the user reported in this thread. However, I believe th

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
Hello Vijay, Thank you for the reply. This though is k8s deployment ( rather then yarn ) but may be they follow the same lifecycle. I issue a* save point with cancel* as documented here https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/rest_api.html#jobs-jobid

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vijay Bhaskar
Hi Vishal, yarn-cancel doesn't mean to be for yarn cluster. It works for all clusters. Its recommended command. Use the following command to issue save point. curl --header "Content-Type: application/json" --request POST --data '{"target-directory":"hdfs://*:8020/tmp/xyz1","cancel-job":

Re: Set partition number of Flink DataSet

2019-03-12 Thread qi luo
Hi Ken, Thanks for your reply. I may not make myself clear: our problem is not about reading but rather writing. We need to write to N files based on key partitioning. We have to use setParallelism() to set the output partition/file number, but when the partition number is too large (~100K),

Re: Random forest - Flink ML

2019-03-12 Thread Avi Levi
Thanks Flavio, I will definitely check it out. But from a quick glance , it seems that it is missing implementation of "random forest" which is something that we are looking for . If anyone can recommend/suggest/share that will be greatly appreciated. Best Regards Avi On Mon, Mar 11, 2019 at 10:

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
Aah. Let me try this out and will get back to you. Though I would assume that save point with cancel is a single atomic step, rather then a save point *followed* by a cancellation ( else why would that be an option ). Thanks again. On Tue, Mar 12, 2019 at 4:50 AM Vijay Bhaskar wrote: > Hi Vish

Re: Random forest - Flink ML

2019-03-12 Thread Benoît Paris
There has been some developments at Apache SAMOA for a forest of decision trees. This is not regular Random Forest, but a form of trees that can be incrementally learned fast. If I recall correctly they also have adaptive algorithms as well. Here are some r

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vijay Bhaskar
Yes Its supposed to work. But unfortunately it was not working. Flink community needs to respond to this behavior. Regards Bhaskar On Tue, Mar 12, 2019 at 3:45 PM Vishal Santoshi wrote: > Aah. > Let me try this out and will get back to you. > Though I would assume that save point with cancel i

Re: What does "Continuous incremental cleanup" mean in Flink 1.8 release notes

2019-03-12 Thread Konstantin Knauf
Hi Tony, yes, when taking a savepoint the same strategy as the during a non-incremental checkpoint is used. Best, Konstantin On Mon, Mar 11, 2019 at 2:29 AM Tony Wei wrote: > Hi Konstantin, > > That is really helpful. Thanks. > > Another follow-up question: The document said "Cleanup in full

Re: Backoff strategies for async IO functions?

2019-03-12 Thread Konstantin Knauf
Hi Shuyi, I am not sure. You could handle retries in the user code within org.apache.flink.streaming.api.functions.async.AsyncFunction#asyncInvoke without using a DLQ as described in my original answer to William. On the other hand, I agree that it could easier for the user and it is indeed a com

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-12 Thread Piyush Narang
Thanks for getting back Kurt. Yeah this might be an option to try out. I was hoping there would be a way to express this directly in the SQL though ☹. -- Piyush From: Kurt Young Date: Tuesday, March 12, 2019 at 2:25 AM To: Piyush Narang Cc: "user@flink.apache.org" Subject: Re: Expressing Fli

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
I must add that there has to be more love for k8s flink deployments. IMHO that is the way to go. Maintaining a captive/session cluster, if you have k8s on premise is pretty much a no go for various reasons. On Tue, Mar 12, 2019 at 9:44 AM Vishal Santoshi wrote: > This really not cool but here

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vijay Bhaskar
Yes Vishal. Thats correct. Regards Bhaskar On Tue, Mar 12, 2019 at 7:14 PM Vishal Santoshi wrote: > This really not cool but here you go. This seems to work. Agreed that this > cannot be this painful. The cancel does not exit with an exit code pf 0 and > thus the job has to manually delete. Vij

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
Thanks Vijay, This is the larger issue. The cancellation routine is itself broken. On cancellation flink does remove the checkpoint counter *2019-03-12 14:12:13,143 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCheckpointIDCounter - Removing /checkpoint-counter/000

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vijay Bhaskar
Oh, Yeah this is larger issue indeed :) Regards Bhaskar On Tue, Mar 12, 2019 at 7:51 PM Vishal Santoshi wrote: > Thanks Vijay, > > This is the larger issue. The cancellation routine is itself broken. > > On cancellation flink does remove the checkpoint counter > > *2019-03-12 14:12:13,143 > IN

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Gary Yao
Hi Vishal, This issue was fixed recently [1], and the patch will be released with 1.8. If the Flink job gets cancelled, the JVM should exit with code 0. There is a release candidate [2], which you can test. Best, Gary [1] https://issues.apache.org/jira/browse/FLINK-10743 [2] http://apache-flink-

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
:) That makes so much more sense. Is k8s native flink a part of this release ? On Tue, Mar 12, 2019 at 10:27 AM Gary Yao wrote: > Hi Vishal, > > This issue was fixed recently [1], and the patch will be released with > 1.8. If > the Flink job gets cancelled, the JVM should exit with code 0. Ther

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
And when is the 1.8.0 release expected ? On Tue, Mar 12, 2019 at 10:32 AM Vishal Santoshi wrote: > :) That makes so much more sense. Is k8s native flink a part of this > release ? > > On Tue, Mar 12, 2019 at 10:27 AM Gary Yao wrote: > >> Hi Vishal, >> >> This issue was fixed recently [1], and

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Gary Yao
Hi Vishal, I'm afraid not but there are open pull requests for that. You can track the progress here: https://issues.apache.org/jira/browse/FLINK-9953 Best, Gary On Tue, Mar 12, 2019 at 3:32 PM Vishal Santoshi wrote: > :) That makes so much more sense. Is k8s native flink a part of this

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
This really not cool but here you go. This seems to work. Agreed that this cannot be this painful. The cancel does not exit with an exit code pf 0 and thus the job has to manually delete. Vijay does this align with what you have had to do ? - Take a save point . This returns a request id c

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Gary Yao
Nobody can tell with 100% certainty. We want to give the RC some exposure first, and there is also a release process that is prescribed by the ASF [1]. You can look at past releases to get a feeling for how long the release process lasts [2]. [1] http://www.apache.org/legal/release-policy.html#rel

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
Do you have a mvn repository ( at mvn central ) set up for 1,8 release candidate. We could test it for you. Without 1.8and this exit code we are essentially held up. On Tue, Mar 12, 2019 at 10:56 AM Gary Yao wrote: > Nobody can tell with 100% certainty. We want to give the RC some exposure > f

Re: local disk cleanup after crash

2019-03-12 Thread Gary Yao
Hi, If no other TaskManager (TM) is running, you can delete everything. If multiple TMs share the same host, as far as I know, you will have to parse TM logs to know what directories you can delete [1]. As for local recovery, tasks that were running on a crashed TM are lost. From the documentation

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Gary Yao
The RC artifacts are only deployed to the Maven Central Repository when the RC is promoted to a release. As written in the 1.8.0 RC1 voting email [1], you can find the maven artifacts, and the Flink binaries here: - https://repository.apache.org/content/repositories/orgapacheflink-1210/ -

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-12 Thread Vishal Santoshi
Awesome, thanks! On Tue, Mar 12, 2019 at 11:53 AM Gary Yao wrote: > The RC artifacts are only deployed to the Maven Central Repository when > the RC > is promoted to a release. As written in the 1.8.0 RC1 voting email [1], you > can find the maven artifacts, and the Flink binaries here: > >

Re: Set partition number of Flink DataSet

2019-03-12 Thread Ken Krugler
Hi Qi, If I understand what you’re trying to do, then this sounds like a variation of a bucketing sink. That typically uses a field value to create a directory path or a file name (though the filename case is only viable when the field is also what’s used to partition the data) But I don’t be

Re: How to join stream and dimension data in Flink?

2019-03-12 Thread Hequn Cheng
Hi Henry, Yes, you are correct. Basically, there are two ways you can use to join a Temporal Table. One is provided in Flink and the other is provided in Blink which has been pushed as a branch[1] in Flink repo. - Join a Temporal Table in Flink[2][3][4] As the document said: it is a join with a t

Conflicting Cassandra versions while using flink-connector-cassandra

2019-03-12 Thread Gustavo Momenté
I'm having trouble using CassandraSink while also using datastax's cassandra-driver. While digging trough the flink-connector-cassandra.jar I realized that it bundles *cassandra-driver-core 3.0.0* while internally we use version *3.1.4* to read data from Cassandra. I couldn't find the reason why t

Re: Conflicting Cassandra versions while using flink-connector-cassandra

2019-03-12 Thread Congxian Qiu
Hi Gustavo Momenté If you want the both driver versions coexist, maybe you could try maven shade plugin[1] [1] https://maven.apache.org/plugins/maven-shade-plugin/ Best, Congxian On Mar 13, 2019, 02:22 +0800, Gustavo Momenté , wrote: > I'm having trouble using CassandraSink while also using dat

Re: Conflicting Cassandra versions while using flink-connector-cassandra

2019-03-12 Thread Gustavo Momenté
Can I shade `flink-connector-cassandra` version? And if so do you know why it isn't shaded by default? Em ter, 12 de mar de 2019 às 23:00, Congxian Qiu escreveu: > Hi Gustavo Momenté > If you want the both driver versions coexist, maybe you could try maven > shade plugin[1] > > [1] https://maven

Re: Task slot sharing: force reallocation

2019-03-12 Thread Le Xu
Thanks Till. I switched to Flink 1.7.1 and it seems to solve part of my problem (the downstream operator does not seem to sit on the same machine anymore). But the new problem is that does Flink implicitly set all sub tasks of data sources generated by RichParallelFunction to be inside the same sl

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-12 Thread Kurt Young
Hi Piyush, I think your second sql is correct, but the problem you have encountered is the outside aggregation (GROUP BY userId & COLLECT(client_custom_aggregated)) will emit result immediately when receiving results from the inner aggregation. Hence Flink need the sink to 1. either has ability to

Re: How to join stream and dimension data in Flink?

2019-03-12 Thread 徐涛
Hi Hequn, Thanks a lot for your answer! That is very helpful for me. I still have some questions about stream and dimension data join and temporal table join: 1. I found the temporal table join is still a one stream driven join, I do not know why the dimension data join