Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-03 Thread Jingsong Li
Hi Jark, +1 for default blink planner in SQL-CLI. I believe this new planner can be put into practice in production. We've worked hard for nearly a year, but the old planner didn't move on. And I'd like to cc to user@flink.apache.org. If anyone finds that blink planner has any significant defects

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-03 Thread Jeff Zhang
+1, I have already made blink as the default planner of flink interpreter in Zeppelin Jingsong Li 于2020年1月3日周五 下午4:37写道: > Hi Jark, > > +1 for default blink planner in SQL-CLI. > I believe this new planner can be put into practice in production. > We've worked hard for nearly a year, but the ol

Re: Checkpoints issue and job failing

2020-01-03 Thread Congxian Qiu
Hi Do you have ever check that this problem exists on Flink 1.9? Best, Congxian vino yang 于2020年1月3日周五 下午3:54写道: > Hi Navneeth, > > Did you check if the path contains in the exception is really can not be > found? > > Best, > Vino > > Navneeth Krishnan 于2020年1月3日周五 上午8:23写道: > >> Hi All, >>

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-03 Thread Benoît Paris
> If anyone finds that blink planner has any significant defects and has a larger regression than the old planner, please let us know. Overall, the Blink-exclusive features are must (TopN, deduplicate, LAST_VALUE, plan reuse, etc)! But all use cases of the Legacy planner in production are not cov

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-03 Thread Jark Wu
Hi Benoît, Thanks for the reminder. I will look into the issue and hopefully we can target it into 1.9.2 and 1.10. Cheers, Jark On Fri, 3 Jan 2020 at 18:21, Benoît Paris < benoit.pa...@centraliens-lille.org> wrote: > > If anyone finds that blink planner has any significant defects and has > a

Re: Late outputs for Session Window

2020-01-03 Thread KristoffSC
After following suggestion from SO I added few changes, so now I'm using Event Time Water marks are progressing, I've checked them in Flink's metrics. The Window operator is triggered but still I don't see any late outputs for this. StreamExecutionEnvironment env = StreamExecutionEnvironment.get

Re: Yarn Kerberos issue

2020-01-03 Thread Chesnay Schepler
From what I understand from the documentation, if you want to use delegation tokens you always first have to issue a ticket using kinit; so you did everything correctly? On 02/01/2020 13:00, Juan Gentile wrote: Hello, Im trying to submit a job (batch worcount) to a Yarn cluster. I’m trying

Re: Migrate custom partitioner from Flink 1.7 to Flink 1.9

2020-01-03 Thread Chesnay Schepler
You should be able to implement this on the DataStream API level using DataStream#broadcast and #union like this: input = ... singleChannel = input.filter(x -> !x.isBroadCastPartitioning); broadcastChannel = input.filter(x -> x.isBroadCastPartitioning); result = broadcastChannel.broadcast().u

Re: kafka: how to stop consumption temporarily

2020-01-03 Thread Chesnay Schepler
Are you asking how to detect from within the job whether the dump is complete, or how to combine these 2 jobs? If you had a way to notice whether the dump is complete, then I would suggest to create a custom source that wraps 2 kafka sources, and switch between them at will based on your condi

Re: Change Akka Ask Timeout for Job Submission Only

2020-01-03 Thread Chesnay Schepler
There are 3 communication layers involved here: 1) client <=> server (REST API) This goes through REST and does not use timeouts AFAIK. We wait until a response comes or the connection terminates. 2) server (REST API) <=> processes (JM, Dispatcher) This goes through akka, with "web.timeout"

Re: Flink task node shut it self off.

2020-01-03 Thread Chesnay Schepler
The logs show 2 interesting pieces of information: ... 2019-12-19 18:33:23,278 INFO org.apache.kafka.clients.FetchSessionHandler  - [Consumer clientId=consumer-4, groupId=ccdb-prod-import] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 0: org.apach

Re: Checkpoints issue and job failing

2020-01-03 Thread Navneeth Krishnan
Thanks Congxian & Vino. Yes, the file do exist and I don't see any problem in accessing it. Regarding flink 1.9, we haven't migrated yet but we are planning to do. Since we have to test it might take sometime. Thanks On Fri, Jan 3, 2020 at 2:14 AM Congxian Qiu wrote: > Hi > > Do you have ever

Re: Flink task node shut it self off.

2020-01-03 Thread John Smith
Well there was this huge IO wait like over 140% spike. IO wait rose slowly for couple hours then at some time it spiked at 140% and then after IO wait dropped back to "normal" the CPU 1min 5min 15min spiked to like 3 times the number of cores for a bit. We where at "peek" operation. I.e we where r

Re: Duplicate tasks for the same query

2020-01-03 Thread RKandoji
Thanks! On Thu, Jan 2, 2020 at 9:45 PM Jingsong Li wrote: > Yes, > > 1.9.2 or Coming soon 1.10 > > Best, > Jingsong Lee > > On Fri, Jan 3, 2020 at 12:43 AM RKandoji wrote: > >> Ok thanks, does it mean version 1.9.2 is what I need to use? >> >> On Wed, Jan 1, 2020 at 10:25 PM Jingsong Li >> wro

Re: Duplicate tasks for the same query

2020-01-03 Thread RKandoji
Hi, Thanks a ton for the help with earlier questions, I updated code to version 1.9 and started using Blink Planner (DeDuplication). This is working as expected! I have a new question, but thought of asking in the same email chain as this has more context about my use case etc. Workflow: Current

Table API: Joining on Tables of Complex Types

2020-01-03 Thread Hailu, Andreas
Hi folks, I'm trying to join two Tables which are composed of complex types, Avro's GenericRecord to be exact. I have to use a custom UDF to extract fields out of the record and I'm having some trouble on how to do joins on them as I need to call this UDF to read what I need. Example below: ba

Flink logging issue with logback

2020-01-03 Thread Bajaj, Abhinav
Hi, I am investigating a logging issue with Flink. Setup * Using Flink-1.7.1 using logback as suggested in Flink documentation here. * Submitting the Flink job fro

Re: Duplicate tasks for the same query

2020-01-03 Thread Kurt Young
Hi RKandoji, It looks like you have a data skew issue with your input data. Some or maybe only one "userId" appears more frequent than others. For join operator to work correctly, Flink will apply "shuffle by join key" before the operator, so same "userId" will go to the same sub-task to perform j

Re: Flink group with time-windowed join

2020-01-03 Thread Kurt Young
Looks like a bug to me, could you fire an issue for this? Best, Kurt On Thu, Jan 2, 2020 at 9:06 PM jeremyji <18868129...@163.com> wrote: > Two stream as table1, table2. We know that group with regular join won't > work > so we have to use time-windowed join. So here is my flink sql looks like:

Re: Controlling the Materialization of JOIN updates

2020-01-03 Thread Kurt Young
Hi Benoît, Before discussing all the options you listed, I'd like understand more about your requirements. The part I don't fully understand is, both your fact (Event) and dimension (DimensionAtJoinTimeX) tables are coming from the same table, Event or EventRawInput in your case. So it will resul

Re: Migrate custom partitioner from Flink 1.7 to Flink 1.9

2020-01-03 Thread Salva Alcántara
Thanks Chesnay! Just to be clear, this how my current code looks like: ``` unionChannel = broadcastChannel.broadcast().union(singleChannel) result = new DataStream<>( unionChannel.getExecutionEnvironment(), new PartitionTransformation<>(unionChannel.getTransformation(), new MyDynamicParti