flink new source api, kafka部分对kafka-client的版本要求。

2021-11-10 Thread yidan zhao
如题,当前新的kafaksouce貌似对kafka-client版本做了依赖,比如代码KafkaPartitionSplitReader.acquireAndSetStoppingOffsets方法中用到的 consumer.committed(partitionsStoppingAtCommitted) .forEach( (tp, offsetAndMetadata) -> { Preconditions.checkNotNull(

Re:Re: Re: Re: 公司数据密文,实现group by和join

2021-11-10 Thread liuyehan
您好! 不可以,因为这种加密算法有点特别, 比如id是密文的话,相同id的密文也不一样的,不管你这个密文的明文是不是一样,它的密文形成的字符串肯定不一样; 谢谢. 在 2021-11-11 12:43:48,"yidan zhao" 写道: >如果是group by id,不论id是明文还是密文,相同id的密文肯定也一样,直接group by 密文id不可以吗? > >godfrey he 于2021年11月1日周一 下午3:22写道: > >> 上传的图片没法显示,通过图床工具或纯文本方式重新发一遍 >> >> lyh1067341434

Re: 回复:Re:回复: flink sql消费kafka各分区消息不均衡问题

2021-11-10 Thread yidan zhao
不清楚你说的“作业”是啥,作业多,作业少,你是多个作业吗? 我感觉你是讲subtask数多少估计,如果TM的压力完全是由于flink导致,那应该就是你slot分配在TM不均衡导致。 考虑设置 cluster.evenly-spread-out-slots: true 试试。 casel.chen 于2021年11月1日周一 上午10:48写道: > 写入数据看过是均衡的,没有问题。消费端位点差别挺大,积压情况大部分分区都很小,少数个别分区积压很大,达到数十万级别。跟TM负载有关吗? > > > > > > > > > > > > > > > > > > 在 2021-10-31

Re: Re: Re: 公司数据密文,实现group by和join

2021-11-10 Thread yidan zhao
如果是group by id,不论id是明文还是密文,相同id的密文肯定也一样,直接group by 密文id不可以吗? godfrey he 于2021年11月1日周一 下午3:22写道: > 上传的图片没法显示,通过图床工具或纯文本方式重新发一遍 > > lyh1067341434 于2021年11月1日周一 上午10:42写道: > > > 您好! > > > > 这样好像还是不行,因为group by id ,id还是密文字符串,还是会把id当成字符串处理,所以还是不能正确分组; > > 为了更清楚表达,下面为图示: > > > > 谢谢您! > > > > > > > >

flinksql 写 hive ,orc格式,应该支持下压缩。

2021-11-10 Thread yidan zhao
如题,有支持压缩的方法吗当前,看文档没找到应该。

Re: flinkSQL写hive表,timestamp-pattern设置,分区是yyyyMMdd而不是yyyy-MM-dd的情况怎么搞。

2021-11-10 Thread Jingsong Li
Thanks! +1 to pattern Best, Jingsong On Wed, Nov 10, 2021 at 7:52 PM yidan zhao wrote: > > 我在jira回复了下,我感觉还是能配置化好一些,那个liwei貌似现在加了个basicDate这个太单一了。 > > Jingsong Li 于2021年11月4日周四 下午12:18写道: > > > 你可以自定义个partition.time-extractor.class来自己解析 > > > >

Re: flink广播流

2021-11-10 Thread yidan zhao
合理做法是open中把最初一波配置流加载好,然后广播流只是增量部分数据。 Yuepeng Pan 于2021年11月8日周一 上午10:11写道: > > > > Hi, 俊超. > 如果你指的是数据流必须在接受到一个或者多个ddl数据流才能够继续解析的话,那么你可以在ddl流到达算子之前,将数据流存入liststate,当接收到ddl类型的数据流元素后,先解析或处理 > liststate中的数据,而后继续处理当前与后续的来自数据流的元素。 > 也可以使用上述方式达到 ‘使用广播流的方式来提前加载mysql表结构的变化’ 的逻辑效果。 > >[1].

Re: MongoDB sink

2021-11-10 Thread Yun Tang
Hi, 具体问题建议直接在相关ticket上进行讨论,邮件列表上可能相关人士没有注意到。 祝好 唐云 From: 不许人间见白头 Sent: Wednesday, November 10, 2021 22:28 To: user-zh Subject: MongoDB sink 你好, 请问一下,关于New feature: FLINK-24477 预计什么时候创建PR呢?

Re:Re:Re: flink 1.10 sql 读写 hive 2.1.0

2021-11-10 Thread liuyehan
您好! 再次补充下:报错的那一行代码是TableEnvironment tableEnv = TableEnvironment.create(settings); 直接运行的话,会报错; Exception in thread "main" org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a suitable table factory for 'org.apache.flink.table.delegation.ExecutorFactory' in the

Flink SQL build-in function questions.

2021-11-10 Thread JIN FENG
hi all, I met two problems when I use FlinkSQL. 1. Is there any plan to support bit operation functions ? Currently there is some jira mentioned about this, https://issues.apache.org/jira/browse/FLINK-14990 , https://issues.apache.org/jira/browse/FLINK-12451 But It seems that it hasn't been

Re:Re: flink 1.10 sql 读写 hive 2.1.0

2021-11-10 Thread liuyehan
不好意思,这个是全部的报错; Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at

Re: flink 1.10 sql 读写 hive 2.1.0

2021-11-10 Thread yidan zhao
这报错信息没几句,光代码看不出来啥的。 liuyehan 于2021年11月11日周四 上午9:54写道: > 您好! > > > 感谢您百忙之中抽空看我邮件; > 目前问题: > 使用看flink 1.10官网 hive部分,出现了Exception in thread "main" > java.lang.IncompatibleClassChangeError: Implementing class > at java.lang.ClassLoader.defineClass1(Native Method) > at

flink 1.10 sql 读写 hive 2.1.0

2021-11-10 Thread liuyehan
您好! 感谢您百忙之中抽空看我邮件; 目前问题: 使用看flink 1.10官网 hive部分,出现了Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) pom.xml: org.apache.flink

flink 1.10 sql 读写 hive 2.1.0

2021-11-10 Thread liuyehan
您好! 感谢您百忙之中抽空看我邮件; 目前问题: 使用看flink 1.10官网 hive部分,出现了Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) pom.xml: org.apache.flink

unaligned checkpoint metric

2021-11-10 Thread Kelvin Fann
hi all, how do I interpret the checkpointAlignmentTime value when i am using unaligned checkpointing since there is no alignment? thanks, kelvin

Re: Re: Dependency injection for TypeSerializer?

2021-11-10 Thread Thomas Weise
Thanks for the feedback! @Seth Your suggestion should work, I yet have to try it out. However relying on undocumented behavior (return CompatibleAsIs and its serializer will never be used) would make me hesitant to adopt it as permanent solution. @Arvid There is no issue with constructing

Re: How to express the datatype of sparksql collect_list(named_struct(...))inflinksql?

2021-11-10 Thread Timo Walther
There are multiple ways of having a more generic UDF. I will use pseudo code here: // supports any input def eval(@DataTypeHint(inputGroup = ANY) Object o): String = { } // or you use no annotations at all and simply define a strategy // default input strategy is wildcard def eval(Map[Row,

Re: Upgrade from 1.13.1 to 1.13.2/1.13.3 failing

2021-11-10 Thread Sweta Kalakuntla
Thank you for your response. On Tue, Nov 9, 2021 at 6:55 AM Dawid Wysakowicz wrote: > Hey Sweta, > > Sorry I did not get back to you earlier. > > Could you explain how do you do the upgrade? Do you try to upgrade your > cluster through HA services (e.g. zookeeper)? Meaning you bring down the >

Re: Kafka Source Recovery Behavior

2021-11-10 Thread Mason Chen
Hi all, Any update on this? Best, Mason On Sat, Oct 30, 2021 at 5:56 AM Arvid Heise wrote: > This seems to be a valid concern but I'm not deep enough to clearly say > that this is indeed a bug. @renqschn could you > please double-check? > > On Thu, Oct 28, 2021 at 8:39 PM Mason Chen wrote:

Flink docker on k8s job submission timeout

2021-11-10 Thread dhanesh arole
Hello all, We are trying to run a Flink job in standalone mode using the official docker image on k8s. As per this documentation we have created our custom docker

MongoDB sink

2021-11-10 Thread 不许人间见白头
你好, 请问一下,关于New feature: FLINK-24477 预计什么时候创建PR呢?

Re: select records using JDBC with parameters

2021-11-10 Thread Sigalit Eliazov
Thanks alot it was really related to different versions. I have one more question about this solution the select statement returns list of results i see that when retrieving data we activate row mapper which handles only one row at a time and return PCollection of that row do i have a way to

Re: Dependency injection for TypeSerializer?

2021-11-10 Thread Seth Wiesman
Yes I did, thanks for sending it back :) Copying my previous reply for the ML: Hey Thomas, > > You are correct that there is no way to inject dynamic information into > the TypeSerializer configured from the TypeSerializerSnapshot, but that > should not be a problem for your use case. > > The

Re: flinkSQL写hive表,timestamp-pattern设置,分区是yyyyMMdd而不是yyyy-MM-dd的情况怎么搞。

2021-11-10 Thread yidan zhao
我在jira回复了下,我感觉还是能配置化好一些,那个liwei貌似现在加了个basicDate这个太单一了。 Jingsong Li 于2021年11月4日周四 下午12:18写道: > 你可以自定义个partition.time-extractor.class来自己解析 > > Flink应该搞个对应的partition.time-extractor.kind来默认支持你的需求。 > 建了个JIRA: https://issues.apache.org/jira/browse/FLINK-24758 > > Best, > Jingsong > > On Thu, Nov 4,

Re: Datastream processing on AWS with Python model

2021-11-10 Thread Dian Fu
Hi Des, Regarding kinesis datastream source: currently it still hasn't supported kinesis source natively in PyFlink DataStream API, however, you could use the Kinesis Table API & SQL connectors [1] and then convert the Table to DataStream [2] if you want to work with PyFlink DataStream API.

Datastream processing on AWS with Python model

2021-11-10 Thread Desmond F
Hi all, About our use case - We have many clients connected via websockets through api gateway on AWS, these clients submit events of various types periodically, each event contains a session_id (generated by the client), the session ends when there's no activity for a specified duration of

Re: Access to GlobalJobParameters From DynamicTableSourceFactory

2021-11-10 Thread Krzysztof Chmielewski
Well, the use case is rather simple, just pass a JVM parameter to the Table Source's underlying code and the property should not be exposed to Table Definition. Like something that end user should not change but its driven via application properties. For example you are running SQL that should

Re: JVM cluster not firing event time window

2021-11-10 Thread Caizhi Weng
Hi! Is the number of partitions of your Kafka topic smaller than the number of parallelisms of the job? If yes some parallelism will be idle and will not emit watermarks unless you set idleness for them. See [1]. I guess the original behavior of Flink 1.12 is not the expected behavior but I

Re: Dependency injection for TypeSerializer?

2021-11-10 Thread Arvid Heise
Hi Thomas, Could you add a sketch of your preferred solution? From what I gathered, you have all the information available in your main (probably misunderstood that), so what's keeping you from adding the TypeSerializer as a field to your UDF? On Tue, Nov 9, 2021 at 11:42 AM Krzysztof

Re: Pyflink PyPi build - scala 2.12 compatibility

2021-11-10 Thread Kamil ty
Thank you for the clarification. This was very helpful! Kind regards Kamil On Wed, 10 Nov 2021, 02:26 Dian Fu, wrote: > Hi Kamil, > > You are right that it comes with JAR packages of scala 2.11 in the PyFlink > package as it has to select one version of JARs to bundle, either 2.11 or > 2.12.

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-10 Thread Fabian Paul
Hi Dongwon, Thanks for sharing the logs and the metrics screenshots with us. Unfortunately, I think we need more information to further isolate the problem therefore I have a couple of suggestions. 1. Since you already set up PromQL can you also share the JVM memory statics i.e. DirectMemory

How to specify slot task sharing group for a task manager?

2021-11-10 Thread Morten Gunnar Bjørner Lindeberg
Hi :) I am trying the fine-grained resource management feature in Flink 1.14, hoping it can enable assigning certain operators to certain TaskManagers. The sample code in https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/finegrained_resource/ -shows how to define the