Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, Sorry, I need to correct my comment on using the Kafka ingress / egress with the Harness. That is actually doable, by adding an extra dependency to `statefun-flink-distribution` in your Harness program. That pulls in all the other required dependencies required by the Kafka ingress / egress,

Re: Flink Window with multiple trigger condition

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, To achieve what you have in mind, I think what you have to do is to use a processing time window of 30 mins, and have a custom trigger that matches the "start" event in the `onElement` method and return TriggerResult.FIRE_AND_PURGE. That way, the window fires either when the processing time

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
This in general is not a good idea, as the state you query using queryable state within a job does not provide any consistency guarantees at all. Would it be possible to have some trigger that emits state of the windows, and join the states downstream? In general, that is a better approach for

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, That in general is not a good idea, with the problem you mentioned as well as the fact that the state you query within the same job using queryable state does not provide any means of consistency guarantee. When it comes to "querying state from another operator", it is a hint that your use

Re: Performance impact of many open windows at the same time

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi Joe, The main effect this should have is more state to be kept until the windows can be fired (and state purged). This would of course increase the time it takes to checkpoint the operator. I'm not sure if there will be significant runtime per-record impact caused by how windows are

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Are you getting an exception from running the Harness? The Harness should already have the required configurations, such as the parent first classloading config. Otherwise, if you would like to add your own configuration, use the `withConfiguration` method on the `Harness` class. On Fri, May 22,

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Sorry, forgot to cc user@ as well in the last reply. On Fri, May 22, 2020 at 12:01 PM Tzu-Li (Gordon) Tai wrote: > As an extra note, the utilities you will find in `statefun-e2e-tests`, > such as the `StatefulFunctionsAppsContainers` is not yet intended for users. > This however was previously

本地测试 flink 1.10 hbase sql create table 在反序列化byte之后得到的对象conf配置为null,导致无法连接hbase集群

2020-05-21 Thread shao.hongxiao
一下是我的程序 sql val hbaseTable = """ |CREATE TABLE pvuv_sink( |user_id varchar, |f ROW |) WITH ( |'connector.type' = 'hbase', |'connector.version' = '1.4.3', |'connector.table-name' = 'test_shx', |

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, May 22, 2020 at 7:19 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Also, where do I put flint-conf.yaml in Idea to add additional required > config parameter: > > classloader.parent-first-patterns.additional: >

Re: 本地测试 flink 1.10 hbase sql create table 在反序列化byte之后得到的对象conf配置为null,导致无法连接hbase集群

2020-05-21 Thread Leonard Xu
Hi, hongxiao Please do not send user question to d...@flink.apache.org , d...@flink.apache.org is used for development discussion and only accept English from convenience consideration. dev-subscr...@flink.apache.org

Re: sql client定义指向elasticsearch索引密码问题

2020-05-21 Thread Yangze Guo
目前1.11已经feature freeze,该功能最早1.12才能支持,着急的话可以看看DataStream API的ElasticSearchSink,这个是支持安全认证的,也可以自己实现一个TableSink Best, Yangze Guo On Fri, May 22, 2020 at 9:59 AM Rui Li wrote: > > Hi,目前还不支持,不过有PR在做这个功能:https://github.com/apache/flink/pull/11822 > > On Wed, May 20, 2020 at 4:10 PM naturalfree wrote:

Re: kerberos integration with flink

2020-05-21 Thread Yangze Guo
Hi, Nick, >From my understanding, if you configure the "security.kerberos.login.keytab", Flink will add the AppConfigurationEntry of this keytab to all the apps defined in "security.kerberos.login.contexts". If you define "java.security.auth.login.config" at the same time, Flink will also keep

本地测试 flink 1.10 hbase sql create table 在反序列化byte之后得到的对象conf配置为null,导致无法连接hbase集群

2020-05-21 Thread shao.hongxiao
job graph 阶段 HBaseRowInputFormat.java this.conf = {Configuration@4841} "Configuration: core-default.xml, core-site.xml, hbase-default.xml, hbase-site.xml" quietmode = true allowNullValueProperties = false resources = {ArrayList@4859} size = 2 finalParameters = {Collections$SetFromMap@4860}

flink正则读取hdfs目录下的文件

2020-05-21 Thread 阿华田
input_data = "hdfs://localhost:9002/tmp/match_bak/%s*[0-9]" % ('2018-07-16’) result = sc.textFile(input_data) flink可以像spark一样正则读取hdfs目录下的文件吗?目前测试好像不行,如果不支持,最早什么版本会支持呢? | | 王志华 | | a15733178...@163.com | 签名由网易邮箱大师定制

Re: How do I get the IP of the master and slave files programmatically in Flink?

2020-05-21 Thread Yangze Guo
Hi, Felipe I see your problem. IIUC, if you use AbstractUdfStreamOperator, you could indeed get all the configurations(including what you defined in flink-conf.yaml) through "AbstractUdfStreamOperator#getRuntimeContext().getTaskManagerRuntimeInfo().getConfiguration()". However, I guess it is not

[no subject]

2020-05-21 Thread 王立杰

Re: sql client定义指向elasticsearch索引密码问题

2020-05-21 Thread Rui Li
Hi,目前还不支持,不过有PR在做这个功能:https://github.com/apache/flink/pull/11822 On Wed, May 20, 2020 at 4:10 PM naturalfree wrote: > 在flink sql client配置文件中定义指向es的索引。发现没有设置用户名密码的属性,现在的es connector是否支持安全认证呢 > > | | > naturalfree > | > | > 邮箱:naturalf...@126.com > | > > 签名由 网易邮箱大师 定制 -- Best regards! Rui Li

Stateful functions Harness

2020-05-21 Thread Boris Lublinsky
Hi, I am trying to run https://github.com/apache/flink-statefun/tree/master/statefun-examples/statefun-greeter-example locally using

kerberos integration with flink

2020-05-21 Thread Nick Bendtner
Hi guys, Is there any difference in providing kerberos config to the flink jvm using this method in the flink configuration? env.java.opts: -Dconfig.resource=qa.conf -Djava.library.path=/usr/mware/flink-1.7.2/simpleapi/lib/

flink 写入 kafka 并实现 eos 语义,kafka报错

2020-05-21 Thread yanlishuai
Hi All, 我这里从flink 写入 kafka 并实现 eos 语义, 但是出现了以下错误,希望遇到过的大佬能给点帮助 0-05-21 16:52:15,057 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source (1/1) (f65b2869d898a050238c53f9fbc9573b) switched from DEPLOYING to RUNNING. 2020-05-21 16:52:15,062 INFO

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Annemarie Burger
Hi, So what I meant was that I have a keyed stream, and from each thread/keygroup/PU I want to query the state of the other threads/keygroups/PUs. Does anybody have any experience with this? I'm currently working on it, and the main problem seems to be that the Queryable State Client requires

Re: flink proctime error

2020-05-21 Thread 了不起的盖茨比
谢谢各位大佬,我再去官网学学。 -- Original -- From: Jingsong Li

Re: flink proctime error

2020-05-21 Thread 了不起的盖茨比
意思是虚拟出来的列,如果后面计算要用,需要watermark一下,嗯嗯,这个情况测试了,是可以用的。 -- Original -- From: Jingsong Li

Re: flink proctime error

2020-05-21 Thread 了不起的盖茨比
我一开始想的是source表采用proctime as proctime() 这样有了一个列,然后这个时间赋值给sink表的一个timestamp(3)列,group时候直接就可以用了。 -- Original -- From: Benchao Li

Re: Stream Iterative Matching

2020-05-21 Thread Guowei Ma
Hi, Marc 1. I think you should choose which type of window you want to use first. (Thumbling/Sliding/Session) From your description, I think the session window maybe not suit your case because there is no gap. 2. >>> how this would work in practise or how to handle the case where timers fire for

Re: flink proctime error

2020-05-21 Thread Jingsong Li
Hi, - proctime是虚拟的一个列。 - rowtime是有真实数据的列。 看起来你需要在sink_table里定义rowtime,比如像这样: CREATE TABLE sink_table ( ip VARCHAR, proctime timestamp(3), WATERMARK FOR proctime AS proctime ) Best, Jingsong Lee On Thu, May 21, 2020 at 9:17 PM Benchao Li wrote: >

Re: flink proctime error

2020-05-21 Thread Benchao Li
看你提供的SQL来讲,你是直接在sink_table上做了一个窗口计算,而sink_table并没有定义时间属性。 (是不是笔误,应该是在source_table上做窗口计算?) 了不起的盖茨比 <573693...@qq.com> 于2020年5月21日周四 下午9:08写道: > error:Window aggregate can only be defined over a time attribute column, > but TIMESTAMP(3) encountered. > 如果在sink_table

flink proctime error

2020-05-21 Thread ??????????????
error:Window aggregate can only be defined over a time attribute column, but TIMESTAMP(3) encountered. ??sink_table watermarksourcesink??group by??error?? CREATE TABLE source_table ( sip VARCHAR, proctime as

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Yun Tang
Hi Annemarie Actually, I do not know what exactly PU means in your thread. If you means the task manager, though I haven't tried I think we might be able to query state in the same job. Maybe you could give a try. In general, we would initialize two states in the same operator so that they

Re: Flink Window with multiple trigger condition

2020-05-21 Thread aj
Session window defined on the gap of inactivity, I do not have that requirement. Start the window only on the "*search even*t" that part I will take later. Let's say in the first phase I want to start the window on any event that appears for that user. For example : *Scenario -1* t1 -

Re: Flink on Kubernetes

2020-05-21 Thread Yang Wang
Hi lvan Yang, #1. If a TaskManager crashed exceptionally and there are some running task on it, it could not join back gracefully. Whether the full job will be restarted depends on the failover strategies[1]. #2. Currently, when new TaskManagers join to the Flink cluster, the running Flink job

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Annemarie Burger
Hi, Thanks for your response! What if I'm using regular state instead of windowState, is there any way to use query this state of a PU from another PU in the same Flink job? Best, Annemarie -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

回复:flink如何正则读取hdfs下的文件

2020-05-21 Thread jimandlice
好的 谢谢 | | jimandlice | | 邮箱:jimandl...@163.com | Signature is customized by Netease Mail Master 在2020年05月21日 19:42,Jingsong Li 写道: 1.11还没发布,文档还在编写中 Best, Jingsong Lee On Thu, May 21, 2020 at 7:33 PM jimandlice wrote: > 1.11的话 能提供一个demo么 > > > > > | | > jimandlice > | > | >

Re: flink如何正则读取hdfs下的文件

2020-05-21 Thread Jingsong Li
1.11还没发布,文档还在编写中 Best, Jingsong Lee On Thu, May 21, 2020 at 7:33 PM jimandlice wrote: > 1.11的话 能提供一个demo么 > > > > > | | > jimandlice > | > | > 邮箱:jimandl...@163.com > | > > Signature is customized by Netease Mail Master > > 在2020年05月21日 19:31,Jingsong Li 写道: > > 写入之后 还需要用脚本倒数据入hive么 > -

回复:flink如何正则读取hdfs下的文件

2020-05-21 Thread jimandlice
1.11的话 能提供一个demo么 | | jimandlice | | 邮箱:jimandl...@163.com | Signature is customized by Netease Mail Master 在2020年05月21日 19:31,Jingsong Li 写道: > 写入之后 还需要用脚本倒数据入hive么 - 用Datastream来写,需要 - 1.11的Table层来写,配置下就自动add partition到hive metastore了 Best, Jingsong Lee On Thu, May 21, 2020 at 7:11 PM

Re: flink如何正则读取hdfs下的文件

2020-05-21 Thread Jingsong Li
> 写入之后 还需要用脚本倒数据入hive么 - 用Datastream来写,需要 - 1.11的Table层来写,配置下就自动add partition到hive metastore了 Best, Jingsong Lee On Thu, May 21, 2020 at 7:11 PM jimandlice wrote: > 写入之后 还需要用脚本倒数据入hive么 > > > > > | | > jimandlice > | > | > 邮箱:jimandl...@163.com > | > > Signature is customized by Netease Mail

回复:flink如何正则读取hdfs下的文件

2020-05-21 Thread jimandlice
写入之后 还需要用脚本倒数据入hive么 | | jimandlice | | 邮箱:jimandl...@163.com | Signature is customized by Netease Mail Master 在2020年05月21日 15:02,Jingsong Li 写道: 看起来是因为你客户端和Server端的依赖不一致导致的问题,你检查下客户端的jars? Best, Jingsong Lee On Thu, May 21, 2020 at 2:57 PM 阿华田 wrote: > public static void main(String[]

Performance impact of many open windows at the same time

2020-05-21 Thread Joe Malt
Hi all, I'm looking into what happens when messages are ingested with timestamps far into the future (e.g. due to corruption or a wrong clock at the sender). I'm aware of the effect on watermarking, but another thing I'm concerned about is the performance impact of the extra windows this will

Re: 按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread Jingsong Li
> 在blink planner中,通过连接hive的方式将数据转成DataSet[Row]呢 不能,社区正在做BoundedStream,在stream上完成DataSet的功能。 Best, Jingsong Lee On Thu, May 21, 2020 at 6:45 PM 张锴 wrote: > 我去看看 > > Jeff Zhang 于2020年5月21日周四 下午4:54写道: > > > 可以在zeppelin里写scala代码,是支持hive的,参考这个视频, > >

Re: 按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread 张锴
我去看看 Jeff Zhang 于2020年5月21日周四 下午4:54写道: > 可以在zeppelin里写scala代码,是支持hive的,参考这个视频, > https://www.bilibili.com/video/BV1Te411W73b?p=10 > > 也可以到这个钉钉群讨论: 30022475 > > Jingsong Li 于2020年5月21日周四 下午4:43写道: > > > Hi, > > > > 不好意思,现在版本hive connector已经不支持old planner了, > > 但是scala shell还是默认old planner。 > >

Re: 按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread 张锴
我想请问,在blink planner中,通过连接hive的方式将数据转成DataSet[Row]呢 Jingsong Li 于2020年5月21日周四 下午4:43写道: > Hi, > > 不好意思,现在版本hive connector已经不支持old planner了, > 但是scala shell还是默认old planner。 > > Best, > Jingsong Lee > > On Thu, May 21, 2020 at 3:24 PM 张锴 wrote: > > > 具体操作及错误信息我贴到下面,各位大佬帮忙看下如何解决,不知道是不是BUG。 > > >

Re: Re: 使用rocksdb backend 内存溢出的问题疑问

2020-05-21 Thread Congxian Qiu
hi 你需要去掉 `getCheckpointBackend()` Best, Congxian hdxg1101300...@163.com 于2020年5月21日周四 下午4:24写道: > 我设置的是true 啊 > > env.setStateBackend( > new RocksDBStateBackend("hdfs://beh/user/dc_cbss/flink5/checkpoints" , > true) > .getCheckpointBackend() > ); > > > hdxg1101300...@163.com > > 发件人: Congxian

Re: 按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread Jeff Zhang
可以在zeppelin里写scala代码,是支持hive的,参考这个视频, https://www.bilibili.com/video/BV1Te411W73b?p=10 也可以到这个钉钉群讨论: 30022475 Jingsong Li 于2020年5月21日周四 下午4:43写道: > Hi, > > 不好意思,现在版本hive connector已经不支持old planner了, > 但是scala shell还是默认old planner。 > > Best, > Jingsong Lee > > On Thu, May 21, 2020 at 3:24 PM 张锴

Re: 按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread Jingsong Li
Hi, 不好意思,现在版本hive connector已经不支持old planner了, 但是scala shell还是默认old planner。 Best, Jingsong Lee On Thu, May 21, 2020 at 3:24 PM 张锴 wrote: > 具体操作及错误信息我贴到下面,各位大佬帮忙看下如何解决,不知道是不是BUG。 > > scala> import org.apache.flink.table.catalog.hive.HiveCatalog > import

Re: Re: 使用rocksdb backend 内存溢出的问题疑问

2020-05-21 Thread hdxg1101300...@163.com
我设置的是true 啊 env.setStateBackend( new RocksDBStateBackend("hdfs://beh/user/dc_cbss/flink5/checkpoints" , true) .getCheckpointBackend() ); hdxg1101300...@163.com 发件人: Congxian Qiu 发送时间: 2020-05-21 15:58 收件人: user-zh 主题: Re: 使用rocksdb backend 内存溢出的问题疑问 Hi 从错误栈看,使用的是 HeapStateBackend,所以肯定不是

Re: Stream Iterative Matching

2020-05-21 Thread ba
Hi Guowei, Thank you for your reply. Are you able to give some detail on how that would work with the per window state you linked? I'm struggling to see how the logic would work. I guess something like a session window on a keyed stream (keyed by sensor ID). Timers would fire 90 seconds after

Re: 使用rocksdb backend 内存溢出的问题疑问

2020-05-21 Thread Congxian Qiu
Hi 从错误栈看,使用的是 HeapStateBackend,所以肯定不是 RocksDBStateBackend 你把代码改成 `env.setStateBackend(new RocksDBStateBackend("hdfs://beh/user/dc_cbss/flink5/checkpoints" , true));` 再尝试一下看看 Best, Congxian hdxg1101300...@163.com 于2020年5月21日周四 下午3:38写道: > 这是内存分析的截图 >

回复: 使用rocksdb backend 内存溢出的问题疑问

2020-05-21 Thread hdxg1101300...@163.com
这是内存分析的截图 https://blog.csdn.net/xiaosannimei/article/details/106259140 hdxg1101300...@163.com 发件人: hdxg1101300...@163.com 发送时间: 2020-05-21 15:10 收件人: user-zh 主题: 使用rocksdb backend 内存溢出的问题疑问 你好我自使用flink 1.10版本的flinkSQL 做UNbound流多表join时 遇到了内存溢出的问题?有几个疑问想咨询下; 1,关于使用rocksdb 我的设置是在代码中指定如下

按照官网进行flink-shell操作,出现无法解决的错误:Only BatchTableSource and InputFormatTableSource are supported in BatchTableEnvironment.

2020-05-21 Thread 张锴
具体操作及错误信息我贴到下面,各位大佬帮忙看下如何解决,不知道是不是BUG。 scala> import org.apache.flink.table.catalog.hive.HiveCatalog import org.apache.flink.table.catalog.hive.HiveCatalog scala> val hiveCatalog = new HiveCatalog("hive", "mydatabase", "/opt/hive2.3.3/conf", "2.3.3"); hiveCatalog:

Re: How do I get the IP of the master and slave files programmatically in Flink?

2020-05-21 Thread Felipe Gutierrez
Hi all, I would like to have the IP of the JobManager, not the Task Executors. I explain why. I have an operator (my own operator that extends AbstractUdfStreamOperator) that sends and receives messages from a global controller. So, regardless of which TaskManager these operator instances are

使用rocksdb backend 内存溢出的问题疑问

2020-05-21 Thread hdxg1101300...@163.com
你好我自使用flink 1.10版本的flinkSQL 做UNbound流多表join时 遇到了内存溢出的问题?有几个疑问想咨询下; 1,关于使用rocksdb 我的设置是在代码中指定如下 env.setStateBackend(new RocksDBStateBackend("hdfs://beh/user/dc_cbss/flink5/checkpoints" , true).getCheckpointBackend()); 但是我在jobmanager启动日志中看到如下info Using application-defined state backend: File

Re: flink如何正则读取hdfs下的文件

2020-05-21 Thread Jingsong Li
看起来是因为你客户端和Server端的依赖不一致导致的问题,你检查下客户端的jars? Best, Jingsong Lee On Thu, May 21, 2020 at 2:57 PM 阿华田 wrote: > public static void main(String[] args) throws Exception { > //初始化任务参数 > ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); > Job job = Job.getInstance(); >

Flink on Kubernetes

2020-05-21 Thread Ivan Yang
Hi, I have setup Filnk 1.9.1 on Kubernetes on AWS EKS with one job manager pod, 10 task manager pods, one pod per EC2 instance. Job runs fine. After a while, for some reason, one pod (task manager) crashed, then the pod restarted. After that, the job got into a bad state. All the parallelisms

回复: flink如何正则读取hdfs下的文件

2020-05-21 Thread 阿华田
public static void main(String[] args) throws Exception { //初始化任务参数 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Job job = Job.getInstance(); //自定义input读取hdfs HadoopInputFormat hadoopIF = new HadoopInputFormat( new TextInputFormat(), LongWritable.class, Text.class,

Flink Window with multiple trigger condition

2020-05-21 Thread aj
Hello All, I am getting a lot of user events in a stream. There are different types of events, now I want to build some aggregation metrics for the user by grouping events in buckets. My condition for windowing is : 1. Start the window for the user when event_name: *"search" *arrived for the