Re:sqlQuery split string

2019-07-24 Thread Haibo Sun
Hi Andres Angel, At present, there seems to be no such built-in function, and you need to register a user-defined function to do that. You can look at the following document to see how to do. https://ci.apache.org/projects/flink/flink-docs-master/dev/table/udfs.html#table-functions Best,

Re:Re: Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Haibo Sun
The following JIRA is about the problem you encounter. I think you should be very interested in its comments.There does seem to be a problem with shading Akka, and Flink is considering isolating the classloader that contain Akka and Scala to allow the applications and Flink to use different

Re: GroupBy result delay

2019-07-24 Thread Hequn Cheng
Hi Fanbin, > 2. I have parallelism = 32 and only one task has the record. Can you please elaborate more on why this would affect the watermark advancement? Each parallel subtask of a source function usually generates its watermarks independently, say wk1, wk2... wkn. The downstream window

Re: LEFT JOIN issue SQL API

2019-07-24 Thread Ruidong Li
Hi, it's because the Outer Joins will generate retractions, consider the behavior of Left Outer Join 1. left record arrives, no matched right record, so +(left, null) will be generated. 2 right record arrives, the previous result should be retracted, so -(left, null) and +(left, right) will be

LEFT JOIN issue SQL API

2019-07-24 Thread Andres Angel
Hello guys I have registered some table environments and now I'm trying to perform a query on these using LEFT JOIN like the example below: Table fullenrichment = tenv.sqlQuery( "SELECT pp.a,pp.b,pp.c,pp.d,pp.a " + " FROM t1 pp LEFT JOIN t2 ent" +

Re: GroupBy result delay

2019-07-24 Thread Fanbin Bu
Hequn, Thanks for the help. It is indeed a watermark problem. From Flink UI, I can see the low watermark value for each operator. And the groupBy operator has lagged value of watermark. I checked the link from SO and confirmed that: 1. I do see record coming in for this operator 2. I have

[ANNOUNCE] The Program of Flink Forward EU 2019 is live

2019-07-24 Thread Fabian Hueske
Hi everyone, I'm happy to announce the program of the Flink Forward EU 2019 conference. The conference takes place in the Berlin Congress Center (bcc) from October 7th to 9th. On the first day, we'll have four training sessions [1]: * Apache Flink Developer Training * Apache Flink Operations

Re: 请教Flink SQL watermark遇到未来时间的处理问题

2019-07-24 Thread zhisheng
hi,仲尼: 通常这种时间超前的数据是由于你机器的时间有问题(未对齐),然后采集上来的数据使用的那个时间可能就会比当前时间超前了(大了),你可以有下面解决方法: 1、在 Flink 从 Kafka 中消费数据后就进行 filter 部分这种数据(可以获取到时间后和当前时间相比一下,如果超前或者超前多久就把这条数据丢掉,之前我自己项目也有遇到过这种数据问题,设置的超前 5 分钟以上的数据就丢失),就不让进入后面生成水印,这样就不会导致因为水印过大而导致你后面的问题 2、在生成水印的地方做判断,如果采集上来的数据的时间远大于当前时间(比如超过 5

Re: Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
Also wanted to check if anyone has ventured into this exercise of shading Akka in Flink .. Is this something that qualifies as one of the roadmap items in Flink ? regards. On Wed, Jul 24, 2019 at 3:44 PM Debasish Ghosh wrote: > Hi Haibo - Thanks for the clarification .. > > regards. > > On

Re: Graceful Task Manager Termination and Replacement

2019-07-24 Thread Aaron Levin
I was on vacation but wanted to thank Biao for summarizing the current state! Thanks! On Mon, Jul 15, 2019 at 2:00 AM Biao Liu wrote: > Hi Aaron, > > From my understanding, you want shutting down a Task Manager without > restart the job which has tasks running on this Task Manager? > > Based on

sqlQuery split string

2019-07-24 Thread Andres Angel
Hello everyone, Following the current available functions https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html, how could I split a column string by a caracter? example column content : col =a,b,c query: Select col from tenv expected return : cola , colb, colc

Re: timeout exception when consuming from kafka

2019-07-24 Thread Yitzchak Lieberman
Hi. Do we have an idea for this exception? Thanks, Yitzchak. On Tue, Jul 23, 2019 at 12:59 PM Fabian Hueske wrote: > Hi Yitzchak, > > Thanks for reaching out. > I'm not an expert on the Kafka consumer, but I think the number of > partitions and the number of source tasks might be interesting

Re: How to handle JDBC connections in a topology

2019-07-24 Thread Chesnay Schepler
Note that in order for the static class approach to work you have to ensure that the class is loaded by the parent classloader, either by placing the class in /lib or configuring `classloader.parent-first-patterns-additional` to pick up this particular class. On 24/07/2019 10:24, Haibo Sun

Re: Accessing variables build within a map/flatmap function within a DS

2019-07-24 Thread Andres Angel
Sure guys thanks for the support. I need to create an register a table based on the content of a DS<>, the point is that within the content I need to parse it somehow and get the part which is the values and the headers. I already tried to create a DS and register the new DS as table with headers

Re: Accessing variables build within a map/flatmap function within a DS

2019-07-24 Thread Chesnay Schepler
Note that this will only work when running the the application in the IDE; specifically it will not work when running on an actual cluster, since your function isn't executed on the same machine as your (presumably) main[] function. We can give you better advice if you tell us what exactly

Re: Accessing variables build within a map/flatmap function within a DS

2019-07-24 Thread Caizhi Weng
Hi Andres, Just define a variable outside and modify it in the anonymous class. Andres Angel 于2019年7月24日周三 下午8:44写道: > Hello everyone, > > I was wondering if there is a way how to read the content of a varible > build within a map/flatmap function out of the DS method. > > example: > >

Accessing variables build within a map/flatmap function within a DS

2019-07-24 Thread Andres Angel
Hello everyone, I was wondering if there is a way how to read the content of a varible build within a map/flatmap function out of the DS method. example: DataStream dsString = env.fromElements("1,a,1.1|2,b,2.2,-2", "3,c|4,d,4.4"); DataStream dsTuple = dsString.flatMap(new FlatMapFunction()

Re: Date Handling in Table Source and transformation to DataStream of case class

2019-07-24 Thread Federico D'Ambrosio
Hi Caizhi, thank you for your response, the full exception is the following: Exception in thread "main" org.apache.flink.table.api.TableException: Arity [7] of result [ArrayBuffer(String, String, String, String, String, String, Timestamp)] does not match the number[1] of requested type

Re: Flink 1.8 run参数不一样

2019-07-24 Thread 王佩
问题解决了,非常感谢! 解决流程: 1、确实在log/下找到了Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.异常 2、设置 export HADOOP_CONF_DIR=`hadoop classpath` 3、重新运行 bin/flink run --help ,出现了`Options for yarn-cluster mode` 选项 感谢大佬!❤❤❤ Zili Chen 于2019年7月24日周三 上午9:51写道: > 你好,可以查看下 log/

Re: Date Handling in Table Source and transformation to DataStream of case class

2019-07-24 Thread Caizhi Weng
Hi Federico, I can't reproduce the error in my local environment. Would you mind sharing us your code and the full exception stack trace? This will help us diagnose the problem. Thanks. Federico D'Ambrosio 于2019年7月24日周三 下午5:45写道: > Hi Caizhi, > > thank you for your response. > > 1) I see, I'll

Re: Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
Hi Haibo - Thanks for the clarification .. regards. On Wed, Jul 24, 2019 at 2:58 PM Haibo Sun wrote: > Hi Debasish Ghosh, > > I agree that Flink should shade its Akka. > > Maybe you misunderstood me. I mean, in the absence of official shading > Akka in Flink, the relatively conservative way

Re: Date Handling in Table Source and transformation to DataStream of case class

2019-07-24 Thread Federico D'Ambrosio
Hi Caizhi, thank you for your response. 1) I see, I'll use a compatible string format 2) I'm defining the case class like this: case class cEvent(state: String, id: String, device: String, instance: String, subInstance: String, groupLabel: String, time: Timestamp) object

Re:Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Haibo Sun
Hi Debasish Ghosh, I agree that Flink should shade its Akka. Maybe you misunderstood me. I mean, in the absence of official shading Akka in Flink, the relatively conservative way is to shade Akka of your application (I concern Flink won't work well after shading its Akka). Best, Haibo

Re: Memory constrains running Flink on Kubernetes

2019-07-24 Thread Yang Wang
Hi, The heap in a flink TaskManager k8s pod include the following parts: - jvm heap, limited by -Xmx - jvm non-heap, limited by -XX:MaxMetaspaceSize - jvm direct memory, limited by -XX:MaxDirectMemorySize - native memory, used by rocksdb, just as Yun Tang said, could be limited

Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
For our application users are expected to work with Akka APIs - hence if I shade Akka in my application users will need to work with shaded imports which feels unnatural. With Flink, Akka is an implementation detail and Flink users are not expected to use Akka APIs. Hence shading will not have any

Re: Date Handling in Table Source and transformation to DataStream of case class

2019-07-24 Thread Caizhi Weng
Hi Federico, 1) As far as I know, you can't set a format for timestamp parsing currently (see `SqlTimestampParser`, it just feeds your string to `SqlTimestamp.valueOf`, so your timestamp format must be compatible with SqlTimestamp). 2) How do you define your case class? You have to define its

Re: Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Jeff Zhang
I think it is better to shade all the dependencies of flink so that all the projects that use flink won't hit this kind of issue. Haibo Sun 于2019年7月24日周三 下午4:07写道: > Hi, Debasish Ghosh > > I don't know why not shade Akka, maybe it can be shaded. Chesnay may be > able to answer that. > I

Re:Re: How to handle JDBC connections in a topology

2019-07-24 Thread Haibo Sun
Hi Stephen, I don't think it's possible to use the same connection pool for the entire topology, because the nodes on the topology may run in different JVMs and on different machines. If you want all operators running in the same JVM to use the same connection pool, I think you can

Date Handling in Table Source and transformation to DataStream of case class

2019-07-24 Thread Federico D'Ambrosio
Hello everyone, I've always used the DataStream API and now I'm trying out the Table API to create a datastream from a CSV and I'm finding a couple of issues: 1) I'm reading a csv with 7 total fields, the 7th of which is a date serialized as a Spark TimestampType, written on the csv like this:

Re:Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Haibo Sun
Hi, Debasish Ghosh I don't know why not shade Akka, maybe it can be shaded. Chesnay may be able to answer that. I recommend to shade Akka dependency of your application because it don't be known what's wrong with shading Flink's Akka. CC @Chesnay Schepler Best, Haibo At 2019-07-24

Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
The problem that I am facing is with Akka serialization .. Why not shade the whole of Akka ? java.lang.AbstractMethodError: > akka.remote.RemoteActorRefProvider.serializationInformation()Lakka/serialization/Serialization$Information; > at >

请教Flink SQL watermark遇到未来时间的处理问题

2019-07-24 Thread 郑 仲尼
各位Flink社区大佬, 您好! 我使用Flink SQL (Flink 1.8.0)进行一些聚合计算,消费的是Kafka数据,使用的是EventTime,但是有时候,偶然会出现rowtime字段来了一条未来时间的数据(可能是上送的数据时区导致),这样Watermark会直接推到了未来某个时间点,导致这笔错误数据到达后的数据,到未来时间点之间的数据会被丢弃。 这个问题根本确实是业务方面的问题,但是我们还是希望有一些方案应对这种异常情况。 目前,我们这边处理的方法是:

Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Zili Chen
I can see that we relocate akka's netty, akka uncommon math but also be curious why Flink doesn't shaded all of akka dependencies... Best, tison. Debasish Ghosh 于2019年7月24日周三 下午3:15写道: > Hello Haibo - > > Yes, my application depends on Akka 2.5. > Just curious, why do you think it's

Re: How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
Oh and I'd also need some way to clean up the per-node transient state if the topology stops running on a specific node. On Wed, 24 Jul 2019 at 08:18, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hi, > > So we have a number of nodes in our topology that need to do things like >

How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
Hi, So we have a number of nodes in our topology that need to do things like checking a database, e.g. * We need a filter step to drop events on the floor from systems we are no longer interested in * We need a step that outputs on a side-channel if the event is for an object where the parent is

Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
Hello Haibo - Yes, my application depends on Akka 2.5. Just curious, why do you think it's recommended to shade Akka version of my application instead of Flink ? regards. On Wed, Jul 24, 2019 at 12:42 PM Haibo Sun wrote: > Hi Debasish Ghosh, > > Does your application have to depend on Akka

Re:Flink and Akka version incompatibility ..

2019-07-24 Thread Haibo Sun
Hi Debasish Ghosh, Does your application have to depend on Akka 2.5? If not, it's a good idea to always keep the Akka version that the application depend on in line with Flink. If you want to try shading Akka dependency, I think that it is more recommended to shade Akka dependency of your

Re: Memory constrains running Flink on Kubernetes

2019-07-24 Thread Yun Tang
Hi William Have you ever set the memory limit of your taskmanager pod when launching it in k8s? If not, I'm afraid your node might come across node out-of-memory [1]. You could increase the limit by analyzing your memory usage When talking about the memory usage of RocksDB, a rough calculation

Flink and Akka version incompatibility ..

2019-07-24 Thread Debasish Ghosh
Hello - An application that uses Akka 2.5 and Flink 1.8.0 gives runtime errors because of version mismatch between Akka that we use and the one that Flink uses (which is Akka 2.4). Anyone tried shading Akka dependency with Flink ? Or is there any other alternative way to handle this issue ? I