Re: Question about Scala Case Class and List in Flink

2020-01-15 Thread Timo Walther
Hi, Reg. 1: Scala case classes are supported in the Scala specific version of the DataStream API. If you are using case classes in the Java API you will get the INFO below because the Java API uses pure reflection extraction for analyzing POJOs. The Scala API tries to analyze Scala classes

Filter with large key set

2020-01-15 Thread Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to

Filter with large key set

2020-01-15 Thread Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to

Re: Re: MiniCluster问题

2020-01-15 Thread tison
是的,MiniCluster 会在同一个进程里起 JM TM,是一个主要用于测试的集群 standalone 的意思是没有接 YARN 这种资源管理框架,TM 由用户自己手动起,是一个可用于生产的集群 Best, tison. 郑 洁锋 于2020年1月16日周四 下午2:39写道: > 我是不是可以理解为是local的cluster,本身会去启动一个flink cluster(当前服务器模拟分布式环境),无需单独部署一个flink集群 > > > zjfpla...@hotmail.com > > 发件人:

Re: Re: MiniCluster问题

2020-01-15 Thread 郑 洁锋
我是不是可以理解为是local的cluster,本身会去启动一个flink cluster(当前服务器模拟分布式环境),无需单独部署一个flink集群 zjfpla...@hotmail.com 发件人: tison 发送时间: 2020-01-16 14:29 收件人: user-zh 主题: Re: Re: MiniCluster问题 你这完全是把几个概念混在一起了,MiniCluster

Re: Re: MiniCluster问题

2020-01-15 Thread tison
你这完全是把几个概念混在一起了,MiniCluster 就是一个集群,是一个内部的部署模式,跟 standalone 是平行的概念。我看不懂你要干什么,但是就你的问题来说,我上面说了,就是你 MiniCluster new 出来之后没 start。但我不确定这个是不是你要的效果,MiniCluster 和 standalone 是平行的两种东西。 Best, tison. 郑 洁锋 于2020年1月16日周四 下午2:27写道: > 因为我这边看他是standalone,然后看到flink官方文档里面部署模式也有standalone模式,所以按照那个模式部署后来尝试 > >

Re: Re: MiniCluster问题

2020-01-15 Thread 郑 洁锋
因为我这边看他是standalone,然后看到flink官方文档里面部署模式也有standalone模式,所以按照那个模式部署后来尝试 zjfpla...@hotmail.com 发件人: 郑 洁锋 发送时间: 2020-01-16 14:24 收件人: user-zh 主题: Re: Re: MiniCluster问题 这是完整的到启动的代码 public class

Re: Re: MiniCluster问题

2020-01-15 Thread 郑 洁锋
这是完整的到启动的代码 public class ClusterClientFactory { public static ClusterClient createClusterClient(Options launcherOptions) throws Exception { String mode = launcherOptions.getMode(); if(mode.equals(ClusterMode.standalone.name())) { return

Re: MiniCluster问题

2020-01-15 Thread Jin Yi
Hi 可以参考org.apache.flink.streaming.api.environment.LocalStreamEnvironment:: execute public JobExecutionResult execute(String jobName) throws Exception { // transform the streaming program into a JobGraph StreamGraph streamGraph = getStreamGraph(); streamGraph.setJobName(jobName);

Re: Re: MiniCluster问题

2020-01-15 Thread tison
MiniCluster miniCluster = new MiniCluster(configBuilder.build()); miniCluster.start(); MiniClusterClient clusterClient = new MiniClusterClient(config, miniCluster) ; Best, tison. tison 于2020年1月16日周四 下午1:30写道: > 跟集群无关 > Best, > tison. > > > tison 于2020年1月16日周四 下午1:30写道: > >> 1.

Re: Re: MiniCluster问题

2020-01-15 Thread tison
跟集群无关 Best, tison. tison 于2020年1月16日周四 下午1:30写道: > 1. 这是个内部实现类,文档是一个问题,但最好不要直接依赖 > > 2. ?你截图的代码片段在哪?我看是你自己代码创建使用的啊 > > Best, > tison. > > > 郑 洁锋 于2020年1月16日周四 下午1:18写道: > >> MiniCluster有没有相关文档,官方文档里面看到部署模式里面没有相关的内容? >> 我是通过bin/start-cluster.sh启动的flink standalone集群 >> >> >>

Re: Re: MiniCluster问题

2020-01-15 Thread tison
1. 这是个内部实现类,文档是一个问题,但最好不要直接依赖 2. ?你截图的代码片段在哪?我看是你自己代码创建使用的啊 Best, tison. 郑 洁锋 于2020年1月16日周四 下午1:18写道: > MiniCluster有没有相关文档,官方文档里面看到部署模式里面没有相关的内容? > 我是通过bin/start-cluster.sh启动的flink standalone集群 > > > > zjfpla...@hotmail.com > > 发件人:

Re: Re: MiniCluster问题

2020-01-15 Thread 郑 洁锋
MiniCluster有没有相关文档,官方文档里面看到部署模式里面没有相关的内容? 我是通过bin/start-cluster.sh启动的flink standalone集群 zjfpla...@hotmail.com 发件人: tison 发送时间: 2020-01-16 12:39 收件人: user-zh 主题: Re: MiniCluster问题 你 MiniCluster 要 start

Re: Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread Benchao Li
Hi LakeShen, Maybe "Idle State Retention Time"[1] may help in your case. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html#idle-state-retention-time LakeShen 于2020年1月16日周四 上午10:15写道: > Hi community,now I am use flink sql inner join in my

Re: Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread Benchao Li
Hi LakeShen, Maybe "Idle State Retention Time"[1] may help in your case. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html#idle-state-retention-time LakeShen 于2020年1月16日周四 上午10:15写道: > Hi community,now I am use flink sql inner join in my

Re: MiniCluster问题

2020-01-15 Thread tison
你 MiniCluster 要 start 啊(x Best, tison. 郑 洁锋 于2020年1月16日周四 上午11:38写道: > MiniCluster代码执行过程中报错: > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for

MiniCluster问题

2020-01-15 Thread 郑 洁锋
MiniCluster代码执行过程中报错: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Exception in thread "main" java.lang.IllegalStateException:

Re: How to handle startup for mandatory config parameters?

2020-01-15 Thread Yang Wang
Hi John, Most of the config options will have default values. However, you still need to specify some required fields. For example, the taskmanager resource related options. If you do not specify anyone, the exception will be thrown on the client side like following. Exception in thread "main"

Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread LakeShen
Hi community,now I am use flink sql inner join in my code,I saw the flink document, the flink sql inner join will keep both sides of the join input in Flink’s state forever. As result , the hdfs files size are so big , is there any way to clear the sql join state? Thanks to your reply.

Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread LakeShen
Hi community,now I am use flink sql inner join in my code,I saw the flink document, the flink sql inner join will keep both sides of the join input in Flink’s state forever. As result , the hdfs files size are so big , is there any way to clear the sql join state? Thanks to your reply.

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread Yun Tang
Hi The root cause is checkpoint error due to fail to send data to kafka during 'preCommit'. The right solution is avoid to send data to kafka unsuccessfully which might be scope of Kafka. If you cannot ensure the status of kafka with its client and no request for exactly once, you can pass

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread Yun Tang
Hi The root cause is checkpoint error due to fail to send data to kafka during 'preCommit'. The right solution is avoid to send data to kafka unsuccessfully which might be scope of Kafka. If you cannot ensure the status of kafka with its client and no request for exactly once, you can pass

Re: PubSub source throwing grpc errors

2020-01-15 Thread Richard Deurwaarder
Hi Itamar and Till, Yes this actually looks a lot worse than it is, fortunately. >From what I understand this means: something has not released or properly shutdown an grpc client and the library likes to inform you about this. I would definartly expect to see this if the job crashes at the

Re: Fail to deploy flink on k8s in minikube

2020-01-15 Thread Jin Yi
Hi Jary, >From the Flink Website, it supports Flink Job Cluster deployment strategy on Kubernetes: https://github.com/apache/flink/blob/release-1.9/flink-container/kubernetes/README.md#deploy-flink-job-cluster Best Eleanore On Wed, Jan 15, 2020 at 3:18 AM Jary Zhen wrote: > Thanks to

How to handle startup for mandatory config parameters?

2020-01-15 Thread John Smith
Hi, so I have no problem reading config from resources files or anything like that... But my question is around how do we handle mandatory fields? 1- If a mandatory field is missing during startup... Do we just "log" it and do System.exit()? 2- If we do log it where does the log end up, the task

Re: PubSub source throwing grpc errors

2020-01-15 Thread Till Rohrmann
Hi Itamar, could you share a bit more details about the serialization problem. Which class is not serializable and where does it originate from? Cheers, Till On Tue, Jan 14, 2020 at 9:47 PM Itamar Syn-Hershko < ita...@bigdataboutique.com> wrote: > Thanks! > > I was able to track this down.

Re: Flink task node shut it self off.

2020-01-15 Thread John Smith
Hi, so far it seems stable. On Mon, 6 Jan 2020 at 14:16, John Smith wrote: > So I increased all the jobs to 1 minute checkpoint... I let you know how > it goes... Or of need to rethink gluster lol > > On Sat., Jan. 4, 2020, 9:27 p.m. John Smith, > wrote: > >> It seems to have happened again...

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread jose farfan
Hi I have the same issue. BR Jose On Thu, 9 Jan 2020 at 10:28, ouywl wrote: > Hi all: > When I use flink 1.9.1 and produce data to Kafka 1.1.1. the error was > happen as* log-1,code is::* > > input.addSink( > new FlinkKafkaProducer( >

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, the related issue:https://issues.apache.org/jira/browse/FLINK-15573 As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional charset which can be configured). According to the `PlannerExpressionParserImpl`,

RE: Table API: Joining on Tables of Complex Types

2020-01-15 Thread Hailu, Andreas
Dawid, this approach looks promising. I'm able to flatten out my Avro records into Rows and run simple queries atop of them. I've got a question - when I register my Rows as a table, I see the following log providing a warning: 2020-01-14 17:16:43,083 [main] INFO TypeExtractor - class

Re: Custom File Sink using EventTime and defined custom file name for parquet file

2020-01-15 Thread Kostas Kloudas
Oops, sorry for not sending the reply to everyone and thanks David for reposting it here. Great to hear that you solved your issue! Kostas On Wed, Jan 15, 2020 at 1:57 PM David Magalhães wrote: > > Sorry, I've only saw the replies today. > > Regarding my previous email, > >> Still, there is

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all, the related issue:https://issues.apache.org/jira/browse/FLINK-15573 As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional charset which can be configured). According to the `PlannerExpressionParserImpl`,

Re: Custom File Sink using EventTime and defined custom file name for parquet file

2020-01-15 Thread David Magalhães
Sorry, I've only saw the replies today. Regarding my previous email, Still, there is something missing in this solution to close a window for > with a giving timeout, so it can write into the sink the last events if no > more events are sent. I've fixed this using a custom trigger, val flag =

Re: Slots Leak Observed when

2020-01-15 Thread Till Rohrmann
Hi, have you tried one of the latest Flink versions to see whether the problem still exists? I'm asking because there are some improvements which allow for slot reconciliation between the TaskManager and the JobMaster [1]. As a side note, the community is no longer supporting Flink 1.6.x. For

Re: Fail to deploy flink on k8s in minikube

2020-01-15 Thread Jary Zhen
Thanks to YangWang and 刘建刚, This message is good for me too. Besides, Which flink version can deploy on k8s? On Mon, 13 Jan 2020 at 13:51, 刘建刚 wrote: > Thank you for your help. > > Yang Wang 于2020年1月13日周一 下午12:53写道: > >> Hi, Jiangang >> >> Glad to hear that you are looking to run Flink on

Re: How Flink read files from local filesystem

2020-01-15 Thread Tillman Peng
You can use env.readTextFile(path) which accepts path to a directory and reads all files in that directory producing record for each line. on 2020/1/15 17:58, Soheil Pourbafrani wrote: Suppose we have a Flink single node cluster with multiple slots and some input files exist in local file

Re: [DISCUSS] decentralized scheduling strategy is needed

2020-01-15 Thread HuWeihua
Hi, Andrey Thanks for your response. I have checked this Jira ticket and I think it can work in standalone mode which TaskManager has been started before scheduling tasks. But we are currently running flink on yarn in per-job cluster mode. I noticed that this issue has already been raised. I

Re: [DISCUSS] decentralized scheduling strategy is needed

2020-01-15 Thread Chesnay Schepler
This is a known issue that's will be fixed in 1.9.2/1.10.0; see https://issues.apache.org/jira/browse/FLINK-12122 . On 15/01/2020 10:07, HuWeihua wrote: Hi, All We encountered some problems during the upgrade from Flink 1.5 to Flink 1.9. Flink's scheduling strategy has changed. Flink 1.9

How Flink read files from local filesystem

2020-01-15 Thread Soheil Pourbafrani
Hi, Suppose we have a Flink single node cluster with multiple slots and some input files exist in local file system. In this case where we have no distributed file system to dedicate each file's block to taskmanagers, how Flink will read the file? Do all the task managers will open the file

Re: 求助帖: 流join场景可能出现的重复计算

2020-01-15 Thread Ren Xie
谢谢! 我研究一下 JingsongLee 于2020年1月15日周三 上午11:57写道: > Hi ren, > > Blink的deduplication功能应该是能match你的需求。[1] > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#deduplication > > Best, > Jingsong Lee > > >

?????? Re: Re: Re: ??????:flink ????kafka source ????????????

2020-01-15 Thread Others
lib ---- ??:"JingsongLee"https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html#usage -- From:Others <41486...@qq.com Send

????????????:flink ????kafka source ????????????

2020-01-15 Thread Others
lib ---- ??:"AS"https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies jarflink??lib??() ??. ??. ??2020??01??15??

两个问题:sideoutput 及 sql里的state

2020-01-15 Thread izual
1. SideOutput 按照文档:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/side_output.html 以及 SideOutputITCase.scala 的单测代码,实现了一个一样的例子。不过执行时会报错: Caused by: java.lang.IllegalArgumentException: OutputTag must not be null. 我理解报错是正常的,因为 val outputTag = OutputTag[String]("side-output")

Re: StreamingFileSink doesn't close multipart uploads to s3?

2020-01-15 Thread Kostas Kloudas
Hi Ken, Jingsong and Li, Sorry for the late reply. As Jingsong pointed out, upon calling close() the StreamingFileSink does not commit the in-progress/pending files. The reason for this is that the close() method of any UDF including sink functions is called on both normal termination and

????????????:flink ????kafka source ????????????

2020-01-15 Thread AS
Hi: ??, kafka??factory. ?? https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies jarflink??lib??() ??. ??. ??2020??01??15?? 14:59??Others<41486...@qq.com>

Re: Re: Re: 求助帖:flink 连接kafka source 部署集群报错

2020-01-15 Thread JingsongLee
+user-zh -- From:JingsongLee Send Time:2020年1月15日(星期三) 16:05 To:Others <41486...@qq.com> Subject:Re: Re: Re: 求助帖:flink 连接kafka source 部署集群报错 是的。 另一个方法是使用[1]的classpath,添加多个jars。 BTW, 回复邮件时请带上user-zh。 Best, Jingsong Lee [1]