Re: Source code question - about the logic of calculating network buffer

2019-06-12 Thread 徐涛
Hi Yun, Thanks a lot for the detailed and clear explanation, that is very helpful. Best Henry > 在 2019年6月13日,上午10:32,Yun Gao 写道: > > Hi tao, > > As a whole, `networkBufBytes` is not part of the heap. In fact, it is > allocated from the direct memory. The rough relationship (ign

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-12 Thread Hequn Cheng
Hi Felipe, >From your code, I think you want to get the "count distinct" result instead of the "distinct count". They contain a different meaning. To improve the performance, you can replace your DistinctProcessWindowFunction to a DistinctProcessReduceFunction. A ReduceFunction can aggregate the

Re: TableException

2019-06-12 Thread JingsongLee
Hi Pramit: AppendStreamTableSink defines an external TableSink to emit a streaming table with only insert changes. If the table is also modified by update or delete changes, a TableException will be thrown.[1] Your sql seems have update or delete changes. You can try to use RetractStreamTableSink

Re: What happens when: high-availability.storageDir: is not available?

2019-06-12 Thread Yun Tang
High availability storage directory would store completed checkpoint and submitted job graph and completed checkpoint. If this directory is unavailable when initialization, job would be submitted well. If this directory is unavailable when creating checkpoints, that checkpoint would finally fail

Re: Savepoint status check fails with error Operation not found under key

2019-06-12 Thread Yun Tang
Hi Anaray Did you use /jobs/:jobid/savepoints/744e6b6488212b80deab51486620e348 to query the savepoint status and will Flink always return the same result to you when you query it later again? What's more, have you ever checked the web UI to see whether that savepoint ever triggered. If possible

Re: Source code question - about the logic of calculating network buffer

2019-06-12 Thread Yun Gao
Hi tao, As a whole, `networkBufBytes` is not part of the heap. In fact, it is allocated from the direct memory. The rough relationship (ignores min/max and assumes managed memory is allocated on heap) between the variables are: Total memory of TM (configured by taskmanager.he

Source code question - about the logic of calculating network buffer

2019-06-12 Thread 徐涛
Hi Experts, I am debugging the WordCount Flink streaming program in local mode. Flink version is 1.7.2 I saw the following calculation logic about network buffer in class TaskManagerServices. jvmHeapNoNet is equal to -xmx amount in Java. why the networkBufB

TableException

2019-06-12 Thread Pramit Vamsi
Hi, I am attempting the following: String sql = "INSERT INTO table3 " + "SELECT col1, col2, window_start_time , window_end_time , MAX(col3), MAX(col4), MAX(col5) FROM " + "(SELECT col1,col2, " + "TUMBLE_START(ts, INTERVAL '1' MINUTE) as window_start_time, "

How to restart/recover on reboot?

2019-06-12 Thread John Smith
The installation instructions do not indicate how to create systemd services. 1- When task nodes fail, will the job leader detect this and ssh and restart the task node? From my testing it doesn't seem like it. 2- How do we recover a lost node? Do we simply go back to the master node and run start

ApacheCon North America 2019 Schedule Now Live!

2019-06-12 Thread Rich Bowen
Dear Apache Enthusiast, (You’re receiving this message because you’re subscribed to one or more Apache Software Foundation project user mailing lists.) We’re thrilled to announce the schedule for our upcoming conference, ApacheCon North America 2019, in Las Vegas, Nevada. See it now at https

Re: EXT :How to config user for passwordless ssh?

2019-06-12 Thread John Smith
Ok so if I understand it correctly... I can pass -l myusername to e nv.ssh.opts? On Tue, 11 Jun 2019 at 21:01, Martin, Nick wrote: > Env.ssh.opts is the literal argument string to ssh as you would enter it > on the command line. Take a look at TMSlaves() in config.sh to see exactly > how it’s be

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-12 Thread Felipe Gutierrez
Hi Rong, I implemented my solution using a ProcessingWindow with timeWindow and a ReduceFunction with timeWindowAll [1]. So for the first window I use parallelism and the second window I use to merge everything on the Reducer. I guess it is the best approach to do DistinctCount. Would you suggest s

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-12 Thread Chesnay Schepler
I would just remove them. As you said, there are very limited as to what features they support, and haven't been under active development for several releases. Existing users (if there even are any) could continue to use older version against newer releases. It's is slightly more involved than

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-12 Thread vino yang
+1 from my side Best, Vino Terry Wang 于2019年6月12日周三 下午5:45写道: > +1 for deprecation. It’s very reasonable. > > 在 2019年6月12日,下午5:32,Till Rohrmann 写道: > > +1 for deprecation. > > Cheers, > Till > > On Wed, Jun 12, 2019 at 4:31 AM Hequn Cheng wrote: > >> +1 on the proposal! >> Maintaining only on

Re: Flink 1.2.1 JobManager Election Deadlock

2019-06-12 Thread Mar_zieh
Dear guys Hello I have cluster of three physical nodes with docker installed on each of them. I configured Mesos,Hadoop,Marathon and Zookeeper;they were run without any problems. Then, I ran a Flink application on Marathon. After that, I could open Flink UI on firefox. I tested a program on Flink

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-12 Thread Terry Wang
+1 for deprecation. It’s very reasonable. > 在 2019年6月12日,下午5:32,Till Rohrmann 写道: > > +1 for deprecation. > > Cheers, > Till > > On Wed, Jun 12, 2019 at 4:31 AM Hequn Cheng > wrote: > +1 on the proposal! > Maintaining only one Python API is helpful for users and c

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-12 Thread Till Rohrmann
+1 for deprecation. Cheers, Till On Wed, Jun 12, 2019 at 4:31 AM Hequn Cheng wrote: > +1 on the proposal! > Maintaining only one Python API is helpful for users and contributors. > > Best, Hequn > > On Wed, Jun 12, 2019 at 9:41 AM Jark Wu wrote: > >> +1 and looking forward to the new Python AP

Re: Apache Flink - Disabling system metrics and collecting only specific metrics

2019-06-12 Thread M Singh
Thanks Zhijiang for your response.   I see that system resources reporting can be enable (default not enabled) but not system metrics.  I just wanted to confirm that I am not missing anything. Thanks again. On Tuesday, June 11, 2019, 10:32:51 PM EDT, zhijiang wrote: Hi Mans, AFAIK, we

Re: No yarn option in self-built flink version

2019-06-12 Thread Ufuk Celebi
@Arnaud: Turns out those examples are on purpose. As Chesnay pointed out in the ticket, there are also cases where you don't necessarily want to bundle the Hadoop dependency, but still want to set a version like that. On Wed, Jun 12, 2019 at 9:32 AM Ufuk Celebi wrote: > I created https://issues

Re: No yarn option in self-built flink version

2019-06-12 Thread Ufuk Celebi
I created https://issues.apache.org/jira/browse/FLINK-12813 for this. @Arnaud: Would you be interested in opening a PR with a fix? – Ufuk On Tue, Jun 11, 2019 at 11:10 AM LINZ, Arnaud wrote: > Hello, > > > > Thanks a lot, it works. However, may I suggest that you update the > documentation pag

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-12 Thread Felipe Gutierrez
Hi Rong, thanks for your answer. If I understood well, the option will be to use ProcessFunction [1] since it has the method onTimer(). But not the ProcessWindowFunction [2], because it does not have the method onTimer(). I will need this method to call Collector out.collect(...) from the onTImer(