Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Leonard Xu
Hi, Chesnay > @Leonared I noticed you handled a similar case on the Chinese ML in July > , do you have > any insights? The case in Chinese ML is the user added jakarta.ws.rs-api-3.0.0-M1.jar to Flink/lib which lead the dependency

Re: Flink作业运行失败

2020-10-15 Thread Jeff Zhang
你是hadoop2 吗?我记得这个情况只有hadoop3才会出现 gangzi <1139872...@qq.com> 于2020年10月16日周五 上午11:22写道: > TM > 的CLASSPATH确实没有hadoop-mapreduce-client-core.jar。这个难道是hadoop集群的问题吗?还是一定要shade-hadoop包,官方不推荐shade-hadoop包了。 > > > 2020年10月16日 上午10:50,Jeff Zhang 写道: > > > > 你看看TM的log,里面有CLASSPATH的 > > > > gangzi

回复: flink 自定义udf注册后不能使用

2020-10-15 Thread 史 正超
Hi, 从日志上看 是说 匹配不到 imei_encrypt的UDF,有可能是sql里传的字段和imei_encrypt的参数不匹配, 能看下你的具体代码和udf的声明吗 发件人: 奔跑的小飞袁 发送时间: 2020年10月16日 3:30 收件人: user-zh@flink.apache.org 主题: flink 自定义udf注册后不能使用 hello 我在使用flinkSQL注册udf时,发生了以下错误,这是我定义有问题还是flink的bug

flink 自定义udf注册后不能使用

2020-10-15 Thread 奔跑的小飞袁
hello 我在使用flinkSQL注册udf时,发生了以下错误,这是我定义有问题还是flink的bug org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: SQL validation failed. From line 11, column 6 to line 11, column 23: No match found for function signature imei_encrypt() at

Re: Flink作业运行失败

2020-10-15 Thread gangzi
TM 的CLASSPATH确实没有hadoop-mapreduce-client-core.jar。这个难道是hadoop集群的问题吗?还是一定要shade-hadoop包,官方不推荐shade-hadoop包了。 > 2020年10月16日 上午10:50,Jeff Zhang 写道: > > 你看看TM的log,里面有CLASSPATH的 > > gangzi <1139872...@qq.com> 于2020年10月16日周五 上午10:11写道: > >> 我按照flink官方文档的做法,在hadoop集群每个节点上都:export HADOOP_CLASSPATH

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Yang Wang
I am afraid the InetAddress cache could not take effect. Because Kubernetes only creates A and SRV records for Services. It doesn't generate pods' A records as you may expect. Refer here[1][2] for more information. So the DNS reverse lookup will always fail. IIRC, the default timeout is 5s. This

Re: Flink作业运行失败

2020-10-15 Thread Jeff Zhang
你看看TM的log,里面有CLASSPATH的 gangzi <1139872...@qq.com> 于2020年10月16日周五 上午10:11写道: > 我按照flink官方文档的做法,在hadoop集群每个节点上都:export HADOOP_CLASSPATH =`hadoop > classpath`,但是报:java.lang.NoClassDefFoundError: > org/apache/hadoop/mapred/JobConf > >

Re:Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
我摘取了plan其中一部分 在过滤数据这里 == Abstract Syntax Tree == +- LogicalFilter(condition=[error_exist($1)]) == Optimized Logical Plan == +- PythonCalc(select=[message, kubernetes, clusterName, error_exist(message) AS f0])

Re:Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
== Abstract Syntax Tree == LogicalProject(_c0=[log_get($1)], _c1=[_UTF-16LE''], _c2=[_UTF-16LE''], _c3=[_UTF-16LE'ERROR'], _c4=[_UTF-16LE'Asia/Shanghai'], _c5=[_UTF-16LE'@timestamp'], kubernetes$container$name=[$3.container.name], clusterName=[$2]) +-

Re: Flink作业运行失败

2020-10-15 Thread gangzi
我按照flink官方文档的做法,在hadoop集群每个节点上都:export HADOOP_CLASSPATH =`hadoop classpath`,但是报:java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf

Re:Flink作业运行失败

2020-10-15 Thread Shubin Ruan
尝试在集群的各个节点上执行下述命令: export HADOOP_CLASSPATH= 然后执行任务提交。 在 2020-10-15 22:05:43,"gangzi" <1139872...@qq.com> 写道: >请教一下,flink-1.11.1 yarn per job提交作业后,抛出了如下异常: >java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf >at java.lang.Class.getDeclaredMethods0(Native

Table SQL and updating rows

2020-10-15 Thread Dan Hill
*Context* I'm working on a user event logging system using Flink. I'm tracking some info that belongs to the user's current session (e.g. location, device, experiment info). I don't have a v1 requirement to support mutability but I want to plan for it. I think the most likely reason for

??????????????????????????????????????

2020-10-15 Thread ??????
??aggregateFunction?? | | ?? | | ??xiongyun...@163.com | ?? ??2020??10??15?? 15:47?? ?? Hi,All

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-15 Thread Kostas Kloudas
@Arvid Heise I also do not remember exactly what were all the problems. The fact that we added some more bulk formats to the streaming file sink definitely reduced the non-supported features. In addition, the latest discussion I found on the topic was [1] and the conclusion of that discussion

Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Chesnay Schepler
Can you provide us with the classpath (found at the top of the log file) for your 1.9.2/1.11.2 setups? (to see whether maybe something has changed in regards to classpath ordering) It would also be good to know what things were copied  from opt/ to lib/ or plugins/. (to see whether some

Re: Broadcasting control messages to a sink

2020-10-15 Thread Jaffe, Julian
Hey AJ, I’m not familiar with the stock parquet sink, but if it requires a schema on creation you won’t be able to change the output schema without restarting the job. I’m using a custom sink that can update the schema it uses. The problem I’m facing is how to communicate those updates in an

RE: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Hailu, Andreas
Hi Chesnay, no, we haven't changed our Hadoop version. The only changes were the update the 1.11.2 runtime dependencies listed earlier, as well as compiling with the flink-clients in some of our modules since we were relying on the transitive dependency. Our 1.9.2 jobs are still able to run

Re: what's the datasets used in flink sql document?

2020-10-15 Thread David Anderson
For a dockerized playground that includes a dataset, many working examples, and training slides, see [1]. [1] https://github.com/ververica/sql-training David On Thu, Oct 15, 2020 at 10:18 AM Piotr Nowojski wrote: > Hi, > > The examples in the documentation are not fully functional. They

Flink作业运行失败

2020-10-15 Thread gangzi
请教一下,flink-1.11.1 yarn per job提交作业后,抛出了如下异常: java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at

akka.framesize configuration does not runtime execution

2020-10-15 Thread Yuval Itzchakov
Hi, Due to a very large savepoint metadata file (3GB +), I've set the akka.framesize that is being required to 5GB. I set this via flink.conf `akka.framesize` property. When trying to recover from the savepoint, the JM emits the following error message:

Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread Dian Fu
可以把plan打印出来看一下?打印plan可以参考这个:https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table-api-users-guide/intro_to_table_api.html#explain-tables > 在 2020年10月15日,下午7:02,whh_960101 写道: > > hi, > 我刚才改了一下你的例子[1],通过from_elements构建一个source表 > 然后使用我的udf >

回复: kafka topic字段 不全的统计场景

2020-10-15 Thread 史 正超
@Kyle Zhang 谢谢答复,现在差不多就是你说的这种方式做的。 发送自 Windows 10 版邮件应用 发件人: Kyle Zhang 发送时间: Thursday, October 15, 2020 6:56:08 PM 收件人: user-zh@flink.apache.org 主题: Re: kafka topic字段 不全的统计场景 group

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Chesnay Schepler
The InetAddress caches the result of getCanonicalHostName(), so it is not a problem to call it twice. On 10/15/2020 1:57 PM, Till Rohrmann wrote: Hi Weike, thanks for getting back to us with your findings. Looking at the `TaskManagerLocation`, we are actually calling

Re: Correct way to handle RedisSink exception

2020-10-15 Thread Manas Kale
Hi all, Thank you for the help, I understand now. On Thu, Oct 15, 2020 at 5:28 PM 阮树斌 浙江大学 wrote: > hello, Manas Kale. > > From the log, it can be found that the exception was thrown on the > 'open()' method of the RedisSink class. You can inherit the RedisSink > class, then override the

Re:Correct way to handle RedisSink exception

2020-10-15 Thread 阮树斌 浙江大学
hello, Manas Kale. From the log, it can be found that the exception was thrown on the 'open()' method of the RedisSink class. You can inherit the RedisSink class, then override the 'open()' method, and handle the exception as you wish.Or no longer use Apache Bahir[1] Flink redis connector

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Till Rohrmann
Hi Weike, thanks for getting back to us with your findings. Looking at the `TaskManagerLocation`, we are actually calling `InetAddress.getCanonicalHostName` twice for every creation of a `TaskManagerLocation` instance. This does not look right. I think it should be fine to make the look up

Re: StatsD metric name prefix change for task manager after upgrading to Flink 1.11

2020-10-15 Thread Chesnay Schepler
The TaskExecutor host being exposed is directly wired to what the RPC system for addresses, which may have changed due to (FLINK-15911; NAT support). If the problem is purely about the periods in the IP, then I would suggest to create a custom reporter that extends the StatsDReporter and

Re: Re: flink1.11加载外部jar包进行UDF注册

2020-10-15 Thread amen...@163.com
追加问题,在使用线程上下文类加载器的时候,数据会重复发送三条,这是因为添加pipeline.classpaths的缘故吗? 那这种设置env的方式有可能还会造成其他什么问题? best, amenhub 发件人: amen...@163.com 发送时间: 2020-10-15 19:22 收件人: user-zh 主题: Re: Re: flink1.11加载外部jar包进行UDF注册 非常感谢您的回复! 对于我来说,这是一个非常好的办法,但是我有一个疑问:类加载器只能使用系统类加载器进行加载吗? 因为我在尝试使用系统类加载器的时候,本身作业包开放给外部UDF

Re: Correct way to handle RedisSink exception

2020-10-15 Thread Chesnay Schepler
You will have to create a custom version of the redis connector that ignores such exceptions. On 10/15/2020 1:27 PM, Manas Kale wrote: Hi, I have a streaming application that pushes output to a redis cluster sink. I am using the Apache Bahir[1] Flink redis connector for this. I want to

Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Chesnay Schepler
I'm not aware of any Flink module bundling this class. Note that this class is also bundled in jersey-core (which is also on your classpath), so it appears that there is a conflict between this jar and your shaded one. Have you changed the Hadoop version you are using or how you provide them to

Correct way to handle RedisSink exception

2020-10-15 Thread Manas Kale
Hi, I have a streaming application that pushes output to a redis cluster sink. I am using the Apache Bahir[1] Flink redis connector for this. I want to handle the case when the redis server is unavailable. I am following the same pattern as outlined by them in [1]: FlinkJedisPoolConfig conf = new

Re: Re: flink1.11加载外部jar包进行UDF注册

2020-10-15 Thread amen...@163.com
非常感谢您的回复! 对于我来说,这是一个非常好的办法,但是我有一个疑问:类加载器只能使用系统类加载器进行加载吗? 因为我在尝试使用系统类加载器的时候,本身作业包开放给外部UDF jar包实现的接口会报ClassNotFound异常,而将类加载器指向主类(这种方式的话这里应该是使用默认的线程上下文加载器),则可避免这个问题。 期待您的回复,谢谢~ best, amenhub 发件人: cxydeve...@163.com 发送时间: 2020-10-15 17:46 收件人: user-zh 主题: Re: flink1.11加载外部jar包进行UDF注册

Re: Broadcasting control messages to a sink

2020-10-15 Thread aj
Hi Jaffe, I am also working on something similar type of a problem. I am receiving a set of events in Avro format on different topics. I want to consume these and write to s3 in parquet format. I have written a job that creates a different stream for each event and fetches its schema from the

Re:Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
hi, 我刚才改了一下你的例子[1],通过from_elements构建一个source表 然后使用我的udf source.where(udf1(msg)=True).select("udf2(msg)").execute_insert('sink').get_job_client().get_job_execution_result().result() 打印出来的结果能够很好的筛选出我想要的数据 但是通过StreamTableEnvironment,连接kafka,取消息队列来构建source表,即之前那种写法 source =

Re:Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
hi, 我刚才改了一下你的例子[1],通过from_elements构建一个source表 然后使用我的udf source.where(udf1(msg)=True).select("udf2(msg)").execute_insert('sink').get_job_client().get_job_execution_result().result() 打印出来的结果能够很好的筛选出我想要的数据 但是通过StreamTableEnvironment,连接kafka,取消息队列来构建source表,即之前那种写法 source =

Re: kafka topic字段 不全的统计场景

2020-10-15 Thread Kyle Zhang
group by id应该就可以了吧,其他几个字段用last value或者first value[1],还有就是考虑迟到的数据怎么处理 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/functions/systemFunctions.html On Thu, Oct 15, 2020 at 5:01 PM 史 正超 wrote: > 大佬们,现在我有个场景: > 一个kafka 主题 有 4个字段 , id, field2, field3, field4,其中id 是唯一标识,

flink sql state queryable ?

2020-10-15 Thread kandy.wang
想了解一下flink sql state里的东西,是否可以用datastream里的queryable api 查询 ? 怎么查询呢,是需要知道key 才可以查询么。 诉求就是想知道state里到底存的啥

Re: flink1.11加载外部jar包进行UDF注册

2020-10-15 Thread cxydeve...@163.com
我们用方法是通过反射设置env的配置,增加pipeline.classpaths 具体代码如下 public static void main(final String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings =

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread DONG, Weike
Hi Till and community, By the way, initially I resolved the IPs several times but results returned rather quickly (less than 1ms, possibly due to DNS cache on the server), so I thought it might not be the DNS issue. However, after debugging and logging, it is found that the lookup time exhibited

Re:Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
@udf(input_types =DataTypes.STRING(), result_type = DataTypes.BOOLEAN()) def udf1(msg): #udf1就是简单的筛选log中的error关键字 if msg is None: return '' msg_dic = json.loads(msg.strip()) log = msg_dic.get('log').lower() if 'error' in log or 'fail' in log:

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread DONG, Weike
Hi Till and community, Increasing `kubernetes.jobmanager.cpu` in the configuration makes this issue alleviated but not disappeared. After adding DEBUG logs to the internals of *flink-runtime*, we have found the culprit is inetAddress.getCanonicalHostName() in

kafka topic字段 不全的统计场景

2020-10-15 Thread 史 正超
大佬们,现在我有个场景: 一个kafka 主题 有 4个字段 , id, field2, field3, field4,其中id 是唯一标识, 但是有个问题是,并不是每个消息都会带上全量的字段消息,只有id是固有的字段。然后需要把id, field2, field3, field4 作为一个维度 统计, 比如有如下 kafka消息: {"id": 1, "field2":"b"} {"id": 1, "field3":"c", "field4":"d"} 那么 按照维度 count(1) (group by id, field2, field3, field4)

Re: pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread Xingbo Huang
Hi, 我们有例子覆盖你这种情况的[1],你需要看下你udf1的逻辑是什么样的 [1] https://github.com/apache/flink/blob/release-1.11/flink-python/pyflink/table/tests/test_udf.py#L67 Best, Xingbo whh_960101 于2020年10月15日周四 下午2:30写道: > 您好,我使用pyflink时的代码如下,有如下问题: > > > source = st_env.from_path('source') >

Re: pyflink Table object如何打印出其中内容方便调试

2020-10-15 Thread Xingbo Huang
Hi, 你想要输出table的结果,可以有两种方便的方式, 1. table.to_pandas() 2. 使用print connector,可以参考[1] 然后你如果对pyflink感兴趣,可以看看这个doc[2],可以帮助你快速上手 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/print.html [2]

pyflink Table object如何打印出其中内容方便调试

2020-10-15 Thread whh_960101
您好,我使用pyflink时的代码如下,有如下问题: source = st_env.from_path('source') #st_env是StreamTableEnvironment,source是kafka源端 #udf1 (input_types =DataTypes.STRING(), result_type = DataTypes.BOOLEAN()) table = source.select("msg").where(udf1(msg)=True) 这样单步调试print(table)出来的结果是 pyflink有没有将Table转化成可打印格式的方法

Re: what's the datasets used in flink sql document?

2020-10-15 Thread Piotr Nowojski
Hi, The examples in the documentation are not fully functional. They assume (like in this case), that you would have an already predefined table orders, with the required fields. As I mentioned before, there are working examples available and you can also read the documentation on how to register

Re: kafka table connector保留多久的数据

2020-10-15 Thread Xiao Xu
flink 是不会保留数据的, 数据都是落盘在 kafka 里, flink 根据 offset 去读 kafka 里的数据, 可以设置 kafka 里留存的时间 marble.zh...@coinflex.com.INVALID 于2020年10月14日周三 下午4:37写道: > 你好, 用kafka table > > connector接过来的数据,在flink这边会保留多久,在参数列表里没有看到有这个设置,如果保留太久,内存会撑暴,比如我只想保留半个小时,之前的数据可以清除。 > > > > -- > Sent from:

????????????????????????????????

2020-10-15 Thread ????????????
Hi,All ??flink??1??12:00~12:59??flink ?? env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); .timeWindow(Time.hours(1))

Streaming File Sink cannot generate _SUCCESS tag files

2020-10-15 Thread highfei2011
Hi, everyone! Currently experiencing a problem with the bucketing policy sink to hdfs using BucketAssigner of Streaming File Sink after consuming Kafka data with FLink -1.11.2, the _SUCCESS tag file is not generated by default. I have added the following to the configuration val

Re: blob server相关,文件找不到

2020-10-15 Thread janick
1.9.3 也遇到同样问题,lz 解决了吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: PyFlink :: Bootstrap UDF function

2020-10-15 Thread Sharipov, Rinat
Hi Dian ! Thx a lot for your reply, it's very helpful for us. чт, 15 окт. 2020 г. в 04:30, Dian Fu : > Hi Rinat, > > It's called in single thread fashion and so there is no need for the > synchronization. > > Besides, there is a pair of open/close methods in the ScalarFunction and > you could

pyflink sql中select,where都带udf,其中一个udf失效

2020-10-15 Thread whh_960101
您好,我使用pyflink时的代码如下,有如下问题: source = st_env.from_path('source') #st_env是StreamTableEnvironment,source是kafka源端 #只在where语句中加udf1 (input_types =DataTypes.STRING(), result_type = DataTypes.BOOLEAN()) table =

Re: Processing single events for minimum latency

2020-10-15 Thread Piotr Nowojski
No problem :) Piotrek czw., 15 paź 2020 o 08:18 Pankaj Chand napisał(a): > Thank you for the quick and informative reply, Piotrek! > > On Thu, Oct 15, 2020 at 2:09 AM Piotr Nowojski > wrote: > >> Hi Pankay, >> >> Yes, you can trigger a window per each element, take a look at the Window >>

Re: Broadcasting control messages to a sink

2020-10-15 Thread Piotr Nowojski
Hi Julian, I think the problem is that BroadcastProcessFunction and SinkFunction will be executed by separate operators, so they won't be able to share state. If you can not split your logic into two, I think you will have to workaround this problem differently. 1. Relay on operator chaining and

Re: Processing single events for minimum latency

2020-10-15 Thread Pankaj Chand
Thank you for the quick and informative reply, Piotrek! On Thu, Oct 15, 2020 at 2:09 AM Piotr Nowojski wrote: > Hi Pankay, > > Yes, you can trigger a window per each element, take a look at the Window > Triggers [1]. > > Flink is always processing all records immediately. The only things that >

Re: Processing single events for minimum latency

2020-10-15 Thread Piotr Nowojski
Hi Pankay, Yes, you can trigger a window per each element, take a look at the Window Triggers [1]. Flink is always processing all records immediately. The only things that can delay processing elements are: - buffering elements on the operator's state (vide WindowOperator) - buffer-timeout (but