Re: Regarding state access in UDF

2021-06-30 Thread Kai Fu
Hi Ingo, Thank you for your advice, we've not tried it yet, we just thought it may work that way, but now it seems not then. We'll see how it could match our use case with the AggregateFunction interface. On Thu, Jul 1, 2021 at 1:57 PM Ingo Bürk wrote: > Hi Kai, > > CheckpointedFunction is not

How to calculate how long an event stays in flink?

2021-06-30 Thread xm lian
Hello community, I would like to know how long it takes for an event to flow through the whole Flink pipeline, that consumes from Kafka and sinks to Redis. My current idea is, for each event: 1. calculate a start_time in source (timestamp field of [metadata]( https://ci.apache.org/projects/flink

Re: Regarding state access in UDF

2021-06-30 Thread Ingo Bürk
Hi Kai, CheckpointedFunction is not an interface meant to be used with UDFs (in the Table API / SQL sense[1]), but is rather an interface for DataStream API[2]; the term "user-defined function" has a different meaning there. Did you actually try it to see if it works? I'd be surprised it it did.

Re: Converting Table API query to Datastream API

2021-06-30 Thread Le Xu
Excellent! I'll give it a try. Thanks! Le On Wed, Jun 30, 2021 at 10:14 PM JING ZHANG wrote: > Hi Le, > AFAIK, there are following ways to look deeper into the job, hope it helps. > Before executing the job, > 1. Use `explain` statement to explain the logical and optimized plans of a > query[1

Re: Regarding state access in UDF

2021-06-30 Thread Kai Fu
Hi Ingo, Thank you for the reply, we actually need more fine-grained control on the states in UDF. Per investigation, we found that the states can be simply created/accessed via implementing `CheckpointedFunction` interface, please advise if there is any side-effect by doing that. On Wed, Jun 30,

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Zhu Zhu
Hi Shilpa, JobType was introduced in 1.13. So I guess the cause is that the client which creates and submit the job is still 1.12.2. The client generates a outdated job graph which does not have its JobType set and resulted in this NPE problem. Thanks, Zhu Austin Cawley-Edwards 于2021年7月1日周四 上午1

Re: Converting Table API query to Datastream API

2021-06-30 Thread JING ZHANG
Hi Le, AFAIK, there are following ways to look deeper into the job, hope it helps. Before executing the job, 1. Use `explain` statement to explain the logical and optimized plans of a query[1]. 2. Use `ExecutionEnvironment#getExecutionPlan` to print the execution plan, And use `visualizer` to visua

OutOfMemory Failure on Savepoint

2021-06-30 Thread Abhishek SP
Hello, I am observing a failure whenever I trigger a savepoint on my Flink Application which otherwise runs without issues The app is deployed via AWS KDA(Kubernetes) with 256 KPU(6 Task managers with 43 slots each. 1 KPU = 1 vCPU, 4GB Memory, and 50GB Diskspace. It uses RocksDB backend) The sav

Re: Converting Table API query to Datastream API

2021-06-30 Thread Le Xu
Thanks -- Is there a way to quickly visualize the Stream operator DAG generated by the TableAPI/SQL queries? Le On Tue, Jun 29, 2021 at 9:34 PM JING ZHANG wrote: > Hi Le, > link > > is >

Re: RocksDB MapState debugging key serialization

2021-06-30 Thread Thomas Breloff
Thanks Yuval. Indeed it was a serialization issue. I followed the instructions in the docs to set up a local test environment with RocksDB that I was able to set a breakpoint in and step through. I discovered that my key was not properly registered with the Kryo serializer and the default Fie

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Austin Cawley-Edwards
Hi Shilpa, Thanks for reaching out to the mailing list and providing those logs! The NullPointerException looks odd to me, but in order to better guess what's happening, can you tell me a little bit more about what your setup looks like? How are you deploying, i.e., standalone with your own manife

Re: RocksDB MapState debugging key serialization

2021-06-30 Thread Yuval Itzchakov
Here is what the documentation on RocksDBStateBackend says: The EmbeddedRocksDBStateBackend holds in-flight data in a RocksDB database that is (per default) stored in the T

RocksDB MapState debugging key serialization

2021-06-30 Thread Thomas Breloff
Hello, I am having trouble with a Flink job which is configured using a RocksDB state backend. Tl;dr: How can I debug the key serialization for RocksDB MapState for a deployed Flink job? Details: When I “put” a key/value pair into a MapState, and then later try to “get” using a key which has

Re: Job Recovery Time on TM Lost

2021-06-30 Thread Lu Niu
Thanks Gen! cc flink-dev to collect more inputs. Best Lu On Wed, Jun 30, 2021 at 12:55 AM Gen Luo wrote: > I'm also wondering here. > > In my opinion, it's because the JM can not confirm whether the TM is lost > or it's a temporary network trouble and will recover soon, since I can see > in the

Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Shilpa Shankar
Hello, We have a flink session cluster in kubernetes running on 1.12.2. We attempted an upgrade to v 1.13.1, but the jobmanager pods are continuously restarting and are in a crash loop. Logs are attached for reference. How do we recover from this state? Thanks, Shilpa 2021-06-30 16:03:25,965 ER

Re: Exception in snapshotState suppresses subsequent checkpoints

2021-06-30 Thread tao xiao
Hi team, Does anyone have a clue? On Mon, Jun 28, 2021 at 3:27 PM tao xiao wrote: > My job is very simple as you can see from the code I pasted. I simply > print out the number to stdout. If you look at the log the number continued > to print out after checkpoint 1 which indicated no back press

Re: Protobuf + Confluent Schema Registry support

2021-06-30 Thread Austin Cawley-Edwards
Hi Vishal, I don't believe there is another way to solve the problem currently besides rolling your own serializer. For the Avro + Schema Registry format, is this Table API format[1] what you're referring to? It doesn't look there have been discussions around adding a similar format for Protobuf

Re: Regarding state access in UDF

2021-06-30 Thread Ingo Bürk
Hi Kai, AggregateFunction and TableAggregateFunction are both stateful UDF interfaces. This should cover most scenarios given where they would be used. If you need more fine-grained control you can also always drop down into the DataStream API (using #toDataStream) and work there. Table API / SQL

Regarding state access in UDF

2021-06-30 Thread Kai Fu
Hi team, We've a use case that needs to create/access state in UDF, while per the documentation and UDF interface

Protobuf + Confluent Schema Registry support

2021-06-30 Thread Vishal Surana
Using the vanilla kafka producer, I can write protobuf messages to kafka while leveraging schema registry support as well. A flink kafka producer requires us to explicity provide a serializer which converts the message to a producerrecord containing the serialized bytes of the message. We can't

Re: Recommended way to submit a SQL job via code without getting tied to a Flink version?

2021-06-30 Thread Stephan Ewen
Hi Sonam! To answer this, let me quickly provide some background on the two ways flink deployments / job submissions work. See also here for some background: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/overview/#deployment-modes What is common in all setups is tha

Re: NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-30 Thread Matthias Pohl
Dependending on the build system used, you could check the dependency tree, e.g. for Maven it would be `mvn dependency:tree -Dincludes=org.apache.parquet` Matthias On Wed, Jun 30, 2021 at 8:40 AM Thomas Wang wrote: > Thanks Matthias. Could you advise how I can confirm this in my environment? >

Re: Job Recovery Time on TM Lost

2021-06-30 Thread Gen Luo
I'm also wondering here. In my opinion, it's because the JM can not confirm whether the TM is lost or it's a temporary network trouble and will recover soon, since I can see in the log that akka has got a Connection refused but JM still sends a heartbeat request to the lost TM until it reaches hea

Re: How can I tell if a record in a bounded job is the last record?

2021-06-30 Thread Yik San Chan
Hi Paul, Thanks for the suggestion, this sounds like a nice solution. I will give it a shot. Best, Yik San On Wed, Jun 30, 2021 at 2:26 PM Paul Lam wrote: > Hi Yik San, > > Maybe you could use watermark to trigger the last flush. Source operations > will emit MAX_WATERMARK to trigger all the t

Re: Session cluster configmap removal

2021-06-30 Thread Yang Wang
Hi Sweta, After FLINK-20695[1] is resolved, once the job in session cluster reached to the globally terminal state(FAILED, CENCELED), the HA related ConfigMap will be deleted. So could you please have a try with Flink version 1.13.1, 1.12.5? However, we still have some residual ConfigMaps(e.g. r

Re: Yarn Application Crashed?

2021-06-30 Thread Piotr Nowojski
You are welcome :) Piotrek śr., 30 cze 2021 o 08:34 Thomas Wang napisał(a): > Thanks Piotr. This is helpful. > > Thomas > > On Mon, Jun 28, 2021 at 8:29 AM Piotr Nowojski > wrote: > >> Hi, >> >> You should still be able to get the Flink logs via: >> >> > yarn logs -applicationId application_16