Flink batch mode does not sort by event timestamp

2022-04-21 Thread Han You
I have a custom flink Source, and I have a SerializableTimestampAssigner that assigns event timestamps to records emitted by the source. The source may emit records out of order because of the nature of the underlying data storage, however with BATCH mode, I expect Flink to sort these records

RE: DebeziumAvroDeserializationSchema

2022-04-21 Thread lan tran
Wonder if this is a bug or not but if I use AvroRowDeserializationSchema,In PyFlink the error still occure ?py4j.protocol.Py4JError: An error occurred while calling None.org.apache.flink.formats.avro.AvroRowDeserializationSchema. Trace:org.apache.flink.api.python.shaded.py4j.Py4JException:

RE: AvroRowDeserializationSchema

2022-04-21 Thread lan tran
Wonder if this is a bug or not but if I use AvroRowDeserializationSchema,In PyFlink the error still occure ?py4j.protocol.Py4JError: An error occurred while calling None.org.apache.flink.formats.avro.AvroRowDeserializationSchema. Trace:org.apache.flink.api.python.shaded.py4j.Py4JException:

JobManager doesn't bring up new TaskManager during failure recovery

2022-04-21 Thread Zheng, Chenyu
Hi developers! I got a strange bug during failure recovery of Flink. It seems the JobManager doesn't bring up new TaskManager during failure recovery. Some logs and information of the Flink job are pasted below. Can you take a look and give me some guidance? Thank you so much! Flink version:

JobManager doesn't bring up new TaskManager during failure recovery

2022-04-21 Thread Zheng, Chenyu
Hi developers! I got a strange bug during failure recovery of Flink. It seems the JobManager doesn't bring up new TaskManager during failure recovery. Some logs and information of the Flink job are pasted below. Can you take a look and give me some guidance? Thank you so much! Flink version:

JobManager doesn't bring up new TaskManager during failure recovery

2022-04-21 Thread Zheng, Chenyu
Hi developers! I got a strange bug during failure recovery of Flink. It seems the JobManager doesn't bring up new TaskManager during failure recovery. Some logs and information of the Flink job are pasted below. Can you take a look and give me some guidance? Thank you so much! Flink version:

UUID on TableAPI

2022-04-21 Thread lan tran
Hi team,Currently, I want to use savepoints in Flink. However, one of the things that I concern is that is there any way we can set the UUID while using Table API (SQL API) ? If not, does it has any mechanism to know that when we start the Flink again, it will know that it was that UUID

DebeziumAvroDeserializationSchema

2022-04-21 Thread lan tran
Hi team,Currently, I did not see this functions in PyFlink, therefore any suggestion on using this on PyFlink ?Best,Quynh. Sent from Mail for Windows 

[ANNOUNCE] Apache Kyuubi (Incubating) released 1.5.1-incubating

2022-04-21 Thread Fu Chen
Hi all, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.5.1-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark and

How to dynamically modify the schema information of a table

2022-04-21 Thread 草莓
The following is the Java code @Test public void test(){ StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); DataStream dataStream = env.fromElements("Alice", "Bob", "John"); Schema.Builder

Re: Huge number of GAX-Threads in Flink

2022-04-21 Thread huweihua
Hi, Which connector are you using? Is the increase in the number of threads gradual or is it related to task failover? It would be helpful if you could provide a thread dump. > 2022年4月22日 上午2:12,SHREEKANT ANKALA 写道: > > Hi, Can somebody please help with this issue? This is really blocking us

Re: Integration Test for Kafka Streaming job

2022-04-21 Thread Alexey Trenikhun
Thank you for information ! From: Farouk Sent: Thursday, April 21, 2022 1:14:00 AM To: Aeden Jameson Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Integration Test for Kafka Streaming job Hi I would recommend to use kafka-junit5 from salesforce

Exception Handling in ElasticsearchSink

2022-04-21 Thread Rion Williams
Hi all, I've recently been encountering some issues that I've noticed in the logs of my Flink job that handles writing to an Elasticsearch index. I was hoping to leverage some of the metrics that Flink exposes (or piggyback on them) to update metric counters when I encounter specific kinds of

RE: Huge number of GAX-Threads in Flink

2022-04-21 Thread SHREEKANT ANKALA
Hi, Can somebody please help with this issue? This is really blocking us in Production. If needed, I can provide the thread dump as well. Thanks, Shreekant A. Sent from Mail for Windows From: SHREEKANT ANKALA

Jobmanager trying to be registered for Zombie Job

2022-04-21 Thread Peter Schrott
Hi Flink-Users, I am not sure if this does something to my cluster or not. But since updating to Flink 1.15 (atm rc4) I find the following logs: INFO: Registering job manager ab7db9ff0ebd26b3b89c3e2e56684...@akka.tcp:// fl...@flink-jobmanager-xxx.com:40015/user/rpc/jobmanager_2 for job

flink cdc 时间格式和时区问题

2022-04-21 Thread casel.chen
我在使用flink cdc 2.2.0获取mysql数据变更, mysqlSource设置了 .serverTimeZone("Asia/Shanghai") 发现mysql timestamp 类型的数据在mysql workbench里显示的是 "2021-06-24 16:26:47",通过JsonDebeziumDeserializationSchema解析后得到的json string串是

Re: RocksDB efficiency and keyby

2022-04-21 Thread Yun Tang
Hi Trystan, You can use async-profiler[1] to detect the CPU stack within RocksDB to see what happened, maybe you can try to enable partitioned index & filters[2] if the call stack is occupied by loading index or filter block. [1] https://github.com/jvm-profiling-tools/async-profiler [2]

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-21 Thread Alexis Sarda-Espinosa
Hello, I enabled some of the RocksDB metrics and I noticed some additional things. After changing the configuration YAML, I restarted the cluster with a savepoint, and I can see that it only used 5.6MB on disk. Consequently, after the job switched to running state, the new checkpoints were

web ui中能查看到job失败的原因吗?

2022-04-21 Thread weishishuo...@163.com
我提交一个postgresql cdc 同步数据到 mysql jdbc sink的job,过了一会儿就失败了,点击job的链接,web ui界面的状态是FAILED,但是异常信息不明确 ``` 2022-04-21 17:30:50 org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at

Re: How to define event time and Watermark for intermediary joined table in FlinkSQL?

2022-04-21 Thread Shengkai Fang
Hi, The watermark of the join operator is the minimum of the watermark of the input streams. ``` JoinOperator.watermark = min(left.watermark, right.watermark); ``` I think it's enough for most cases. Could you share more details about the logic in the UDF getEventTimeInNS? I think the better

flink添加表的comment信息无效

2022-04-21 Thread 草莓
Java代码如下: @Test public void test(){ StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); DataStream dataStream = env.fromElements("Alice", "Bob", "John"); Schema.Builder builder =

Re: flink-connector和flink-sql-connector的区别

2022-04-21 Thread Shengkai Fang
hi sql jar 往往是 shade 了相关的依赖,而 普通的 jar 则不带有相关的依赖。正如名字所说,在 table api/sql 的情况下建议使用 sql jar,datastream 建议使用 普通的jar。 Best, Shengkai weishishuo...@163.com 于2022年4月21日周四 16:52写道: > >

flink-connector和flink-sql-connector的区别

2022-04-21 Thread weishishuo...@163.com
cdc项目中每种connector都分成flink-connector-xxx和flink-sql-connector-xxx,比如flink-connector-mysql-cdc和flink-sql-connector-mysql-cdc,这两个的区别是什么呢?在什么场景下用前者,什么场景下用后者? weishishuo...@163.com

Re: Integration Test for Kafka Streaming job

2022-04-21 Thread Farouk
Hi I would recommend to use kafka-junit5 from salesforce https://github.com/salesforce/kafka-junit On top of that, you can use org.apache.flink.runtime.minicluster.TestingMiniCluster Your stack should be complete. Cheers Le jeu. 21 avr. 2022 à 07:10, Aeden Jameson a écrit : > I've had

RE: Kubernetes killing TaskManager - Flink ignoring taskmanager.memory.process.size

2022-04-21 Thread Schwalbe Matthias
Hi Dan, Assuming from previous mails that you are using RocksDb … this could have to do with the glibc bug [1][2] … I’m never sure in which setting this is already been taken care of … However your situation is very typical with glibc as allocator underneath RocksDb and giving more memory won’t

RE: Restore Job from CheckPoint in IntelliJ IDE - MiniCluster

2022-04-21 Thread Schwalbe Matthias
Hi Kostas, Did you give setting execution.savepoint.path a try? You can set the property on local environment by means of env.configure(...). This work for me ... (didn't try yet on Flink 1.15) Thias [1]

Re: Kubernetes killing TaskManager - Flink ignoring taskmanager.memory.process.size

2022-04-21 Thread Yang Wang
Could you please configure a bigger memory to avoid OOM and use NMTracker[1] to figure out the memory usage categories? [1]. https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html Best, Yang Dan Hill 于2022年4月21日周四 07:42写道: > Hi. > > I upgraded to Flink v1.14.4

Re: Problems with PrometheusReporter

2022-04-21 Thread Chesnay Schepler
Please check the logs for warnings. It could be that a metric registered by a job is throwing exceptions. On 20/04/2022 18:45, Peter Schrott wrote: Hi kuweiha, Just to confirm, you tried with 1.15 - none of the rcs are working for me? This port is definitely free as it was already used on

AvroRowDeserializationSchema

2022-04-21 Thread lan tran
Hi team, I want to implement AvroRowDeserializationSchema when consume data from Kafka, however from the documentation, I did not understand what are avro_schema_string and record_class ? I would be great if you can give me the example on this (I only have the example on Java, however, I was doing