Re: presto s3p checkpoints and local stack

2021-01-28 Thread Arvid Heise
Hi Marco, ideally you solve everything with IAM roles, but you can also use credentials providers such as EnvironmentVariableCredentialsProvider[1]. The key should be s3.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider Remember to put the respective jar into th

Re: Flink SQL and checkpoints and savepoints

2021-01-28 Thread Dan Hill
I went through a few of the recent Flink Forward videos and didn't see solutions to this problem. It sounds like some companies have solutions but they didn't talk about them in enough detail to do something similar. On Thu, Jan 28, 2021 at 11:45 PM Dan Hill wrote: > Is this savepoint recovery

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-01-28 Thread Yu Li
Thanks Xintong for being our release manager and everyone else who made the release possible! Best Regards, Yu On Fri, 29 Jan 2021 at 15:05, Xintong Song wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.10.3, which is the third bugfix release for th

Re: Flink SQL and checkpoints and savepoints

2021-01-28 Thread Dan Hill
Is this savepoint recovery issue also true with the Flink Table API? I'd assume so. Just doublechecking. On Mon, Jan 18, 2021 at 1:58 AM Timo Walther wrote: > I would check the past Flink Forward conference talks and blog posts. A > couple of companies have developed connectors or modified exi

Re: Publish heartbeat messages in all Kafka partitions

2021-01-28 Thread Arvid Heise
Hi Alexey, I don't see a way to do it with one message in FlinkKafkaProducer. So you have to multiply the heartbeat yourself. I'd imagine the easiest way would be to query the number of partitions of the output topic (it's static in Kafka) in the heartbeat producer and already produce as all recor

Re: Deduplicating record amplification

2021-01-28 Thread Arvid Heise
Hi Rex, there cannot be any late event in processing time by definition (maybe on a quantum computer?), so you should be fine. The timestamp of records in processing time is monotonously increasing. Best, Arvid On Fri, Jan 29, 2021 at 1:14 AM Rex Fenley wrote: > Switching to TumblingProcessin

[ANNOUNCE] Apache Flink 1.10.3 released

2021-01-28 Thread Xintong Song
The Apache Flink community is very happy to announce the release of Apache Flink 1.10.3, which is the third bugfix release for the Apache Flink 1.10 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-28 Thread Xintong Song
The ZK client side uses 15s connection timeout and 60s session timeout in Flink. There's nothing similar to a heartbeat interval configured, which I assume is up to ZK's internal implementation. These things have not changed in FLink since at least 2017. If both ZK client and server complain about

Re: Initializing broadcast state

2021-01-28 Thread Guowei Ma
Hi Nick Following is an example(could not run but just to explain the idea). I use the `KeyedBroadcastProcessFunction` because I saw your code use the keyedstate. private static class StatefulFunctionWithKeyedStateAccessedOnBroadcast extends KeyedBroadcastProcessFunction { private sta

Configuring ephemeral storage limits when using Native Kubernetes

2021-01-28 Thread Emilien Kenler
Hello, I'm trying to run Flink on Kubernetes, and I recently switched from lyft/flinkk8soperator to the Flink Native Kubernetes deployment mode. I have a long running job, that I want to deploy (using application mode), and after a few hours, I noticed the deployment was disappearing. After a q

Publish heartbeat messages in all Kafka partitions

2021-01-28 Thread Alexey Trenikhun
Hello, We need to publish heartbeat messages in all topic partitions. Is possible to produce single message and then somehow broadcast it to all partitions from FlinkKafkaProducer? Or only way that message source knows number of existing partitions and sends 1 message to 1 partition? Thanks, Al

Re: Deduplicating record amplification

2021-01-28 Thread Rex Fenley
Switching to TumblingProcessingTimeWindows seems to have solved that problem. For my own understanding, this won't have any "late" and therefore dropped records right? We cannot blindly drop a record from the aggregate evaluation, it just needs to take all the records it gets in a window and proce

flink checkpoints adjustment strategy

2021-01-28 Thread Marco Villalobos
I am kind of stuck in determining how large a checkpoint interval should be. Is there a guide for that? If a timeout time is 10 minutes, we time out, what is a good strategy for adjusting that? Where is a good starting point for a checkpoint? How shall they be adjusted? We often see checkpoint

Flink on Kubernetes, Task/Job Manager Recycles

2021-01-28 Thread Julian Cardarelli (CA)
Hello - I am running some testing with flink on Kubernetes. Every let's say five to ten days, all the jobs disappear from running jobs. There's nothing under completed jobs, and there's no record of the submitted jar files in the cluster. In some manner or another, it is almost like going into

[Stateful Functions] Problems with Protobuf Versions

2021-01-28 Thread Jan Brusch
Hi, I have a bit of a strange problem: I can't get a Statefun Application to Compile or Run (Depending on the exact Protobuf version) with a Protobuf version newer than 3.3.0. I have had this problem over multiple project setups and multiple versions of Flink Statefun with Java8. Protobuf 3.

Re: Deduplicating record amplification

2021-01-28 Thread Rex Fenley
It looks like it wants me to call assignTimestampsAndWatermarks but I already have a timer on my window which I'd expect everything entering this stream would simply be aggregated during that window .window(TumblingEventTimeWindows.of(Time.seconds(1))) On Thu, Jan 28, 2021 at 12:59 PM Rex Fenley

Re: Deduplicating record amplification

2021-01-28 Thread Rex Fenley
I think I may have been affected by some late night programming. Slightly revised how I'm using my aggregate val userDocsStream = this.tableEnv .toRetractStream(userDocsTable, classOf[Row]) .keyBy(_.f1.getField(0)) val compactedUserDocsStream = userDocsStream .window(TumblingEventTimeWindows.of(Ti

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-28 Thread Lu Niu
After checking the log I found the root cause is zk client timeout on TM: ``` 2021-01-25 14:01:49,600 WARN org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 40020ms for sessionid 0x404f9ca531a5d6f 2021-01-25 14:01:49,610 INF

question on checkpointing

2021-01-28 Thread Marco Villalobos
Is it possible that checkpointing times out due to an operator taking too long? Also, does windowing affect the checkpoint barriers?

Re: Timers not firing until stream end

2021-01-28 Thread Pilgrim Beart
Chesnay, 1) Correct, I'd like the timeout event (generated at eventTime==1000) to appear in its correct time sequence in the output, i.e. before eventTime exceeds 1000. It's great that Flink can deal with out-of-orderness, but I didn't expect it to spontaneously create it (especially with paralleli

AW: Stateful Functions - accessing the state aside of normal processing

2021-01-28 Thread Stephan Pelikan
Hi Gordon, If operating on checkpoints instead of savepoints this might be OK. But since this is not in the current scope I digged into Flink docs and found the "queryable state" (https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/queryable_state.html#querying-state).

Re: presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
Is it possible to use an environmental credentials provider? On Thu, Jan 28, 2021 at 8:35 AM Arvid Heise wrote: > Hi Marco, > > afaik you don't need HADOOP_HOME or core-site.xml. > > I'm also not sure from where you got your config keys. (I guess from the > Presto page, which probably all work i

Very slow recovery from Savepoint

2021-01-28 Thread Yordan Pavlov
Hello there, I am trying to find the solution for a problem we are having in our Flink setup related to very slow recovery from a Savepoint. I have searched in the mailing list, found a somewhat similar problem, the bottleneck there was the HD usage, but I am not seeing this in our case. Here is a

Connect to schema registry via SSL

2021-01-28 Thread Laurent Exsteens
Hello, I'm trying to us Flink SQL (on Ververica Platform, so no other options than pure Flink SQL) to read confluent avro messages from Kafka, when the schema registry secured via SSL. Would you know what are the correct properties to setup in the kafka consumer config? The following options wor

Re: presto s3p checkpoints and local stack

2021-01-28 Thread Arvid Heise
Hi Marco, afaik you don't need HADOOP_HOME or core-site.xml. I'm also not sure from where you got your config keys. (I guess from the Presto page, which probably all work if you remove hive., maybe we should also support that) All keys with prefix s3 or s3p (and fs.s3, fs.s3p) are routed towards

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Arvid Heise
> > Regarding the try catch block Sorry I meant the try catch in SensorMessageToSensorTimeSeriesFunction. Also, just to be clear, does disabling restart make it easier for you to > debug? > Yes the log will be quite small then. Currently, it's just repeating the same things a couple of times. B

Re: HybridMemorySegment index out of bounds exception

2021-01-28 Thread Timo Walther
FYI: Yuval and I scheduled a call to investigate this serialization issue remotely on Monday. If you have any idea by looking at the code beforehand, let us know. On 28.01.21 16:57, Yuval Itzchakov wrote: Hi Timo, The code example I posted doesn't really match the code that is causing this

presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
Hi, I got s3a working on localstack. The missing piece of information from Flink documentation seems to be that the system requires a HADOOP_HOME and core-site.xml. Flink documentation states that s3p (presto) should be used for file checkpointing into s3. I am using RocksDB, which I assume also

Re: HybridMemorySegment index out of bounds exception

2021-01-28 Thread Yuval Itzchakov
Hi Timo, The code example I posted doesn't really match the code that is causing this issue. I tried to extend it a bit but couldn't make the reproduction work there. I am no longer using the serialized strings, but registering the custom serializers with the runtime during bootstrap and overridin

Re: HybridMemorySegment index out of bounds exception

2021-01-28 Thread Timo Walther
This is helpful information. So I guess the problem must be in the flink-table module and not in flink-core. I will try to reserve some time tomorrow to look into the code again. How did you express RawType(Array[String])? Again with fully serialized type string? Could it be related to https:/

Re: HybridMemorySegment index out of bounds exception

2021-01-28 Thread Yuval Itzchakov
Hi Timo, I tried replacing it with an ordinary ARRAY DataType, which doesn't reproduce the issue. If I use a RawType(Array[String]), the problem still manifests, so I assume it's not directly related to a Kryo serialization of the specific underlying type (io.circe.Json), but something in the way

Re: HybridMemorySegment index out of bounds exception

2021-01-28 Thread Timo Walther
Hi Yuval, we should definitely find the root cause of this issue. It helps if the exception happens frequently to nail down the problem. Have you tried to replace the JSON object with a regular String? If the exception is gone after this change. I believe it must be the serialization and not

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Marco Villalobos
Regarding the try catch block, it rethrows the exception. Here is the code: catch (RuntimeException e) { logger.error("Error in timer.", e); throw e; } That would be okay, right? Also, just to be clear, does disabling restart make it easier for you to debug? On Thu, Jan 28, 2021 at 1:17 AM Arv

Question

2021-01-28 Thread Abu Bakar Siddiqur Rahman Rocky
Hi, Is there any library to use and remember the apache flink snapshot? Thank you -- Regards, Abu Bakar Siddiqur Rahman

Re: Timers not firing until stream end

2021-01-28 Thread Chesnay Schepler
I'm not sure I see the problem in your output. For any given key the timestamps are in order, and the events where devices are offline seem to occur at the right time. Is it just that you'd like the following line to occur earlier in the output? {"ts":1000,"id":"2","is_online":false,"log":"

Cannot access state from a empty taskmanager - using kubernetes

2021-01-28 Thread Daniel Peled
Hi, We have followed the instructions in the following link ""Enabling Queryable State" with kubernetes: https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/kubernetes.html#enabling-queryable-state *When the replicas of the task-manager pods is 1 we g

Problem restirng state

2021-01-28 Thread Shridhar Kulkarni
All, We are getting the exception, copied at the end of this post. The exception is thrown when a new flink job is submitted; when Flink tries to restore the previous state. Environment: Flink version: 1.10.1 State persistence: Hadoop 3.3 Zookeeper 3.5.8 Parallelism: 4 The code i

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-28 Thread Sebastián Magrí
Applied that parameter and that seems to get me some progress here. I still get the shade overlapping classes warning, but I get the PostgreSQLTableFactory in the merged table.factories.Factory service file. However, now on runtime the application fails to find the debezium source function class

Apache Flink Job Manager High CPU with Couchbase

2021-01-28 Thread VINAYA KUMAR BENDI
Hello, We work in a multinational company that produces diesel engines and is working on an IoT platform to analyze engine performance based on sensor data. We are using Flink for deploying analytics stream processing jobs. We recently integrated these jobs with Couchbase (serving as a Cache) an

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-28 Thread Jark Wu
Hi Sebastián, Could you try to add combine.children="append" attribute to the transformers configuration? You can also see the full shade plugin configuration here [1]. Best, Jark [1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/#transform-table-connectorformat-re

Re: JobManager seems to be leaking temporary jar files

2021-01-28 Thread Chesnay Schepler
Code-wise it appears that thing have gotten simpler and we can use use a URLClassLoader within PackagedProgram. We probably won't get around a dedicated close() method on the PackagedProgram. I think in FLINK-21164 I think have identified the right places to issue this call within the jar ha

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-28 Thread Sebastián Magrí
Hi Jark! Please find the full pom file attached. Best Regards, On Thu, 28 Jan 2021 at 03:21, Jark Wu wrote: > Hi Sebastián, > > I think Dawid is right. > > Could you share the pom file? I also tried to > package flink-connector-postgres-cdc with ServicesResourceTransformer, and > the Factory f

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Arvid Heise
Also could you please provide the jobmanager log? It could also be that the underlying failure is somewhere else. On Thu, Jan 28, 2021 at 10:17 AM Arvid Heise wrote: > Hi Marco, > > In general, sending a compressed log to ML is totally fine. You can > further minimize the log by disabling restar

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Arvid Heise
Hi Marco, In general, sending a compressed log to ML is totally fine. You can further minimize the log by disabling restarts. I looked into the logs that you provided. 2021-01-26 04:37:43,280 INFO org.apache.flink.runtime.taskmanager.Task >[] - Attempting to cancel task forward f

Re: Timers not firing until stream end

2021-01-28 Thread Pilgrim Beart
Scratch that - your WatermarkStrategy DOES work (when I implement it correctly!). Well, almost: As you can see below (code pushed to repo), the Timer events are still appearing somewhat late in the stream - 4 events late in this case. It may be just good-enough for my purposes, though it will make

Re: Timers not firing until stream end

2021-01-28 Thread Pilgrim Beart
Chesnay, I cannot reproduce this - I've tried the approaches you suggest, but nothing I've done makes the timers fire at the correct time in the stream - they only fire when the stream has ended. If you have an EventTime example where they fire at the right time in the stream, I'd love to see it. O

Re: flink slot communication

2021-01-28 Thread Piotr Nowojski
Hi, Yes Dawid is correct. Communications between two tasks on the same TaskManager are not going through the network, but via "local" channel (`LocalInputChannel`). It's still serialising and deserializing the data, but there are no network overheads, and local channels have only half of the memor