Re: Spike in checkpoint start delay every 15 minutes

2022-06-16 Thread Hangxiang Yu
Is the 4th "checkpointed size" and "checkpoint duration" bigger than others? If it is true, I guess it's related to the flush of rocksdb. It may delay the next checkpoint. Best, Hangxiang. On Fri, Jun 17, 2022 at 2:31 PM Hangxiang Yu wrote: > Is the 4th "checkpointed size" and "checkpoint durat

Re: Flink, JSON, and JSONSchemas

2022-06-16 Thread Shengkai Fang
Hi. > *1. Is there some easy way to use deserialized JSON in DataStream without case classes or POJOs?* Could you explain what you expected? Do you mean you want to just register a DataType that is able to bridge the received bytes to the POJO objects. I am not sure wether the current RAW type[1]

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Dian Fu
>> This error generally occurs in jobs where there are transfers between Table and datastream. AFAIK, this issue should have already been fixed, see https://issues.apache.org/jira/browse/FLINK-26920 and https://issues.apache.org/jira/browse/FLINK-23133 for more details. Regards, Dian On Fri, Jun

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
Hi John, Because I can't see your code, I can only provide some possible reasons for this error: 1. This error generally occurs in jobs where there are transfers between Table and datastream. But given that you said you just used the sql + python udf, this shouldn't be the case. 2. The default val

Flink, JSON, and JSONSchemas

2022-06-16 Thread Andrew Otto
At the Wikimedia Foundation, we use JSON and JSONSchemas for our events in Kafka. There are hundreds of these schemas and topics in Kafka. I'd like to provide library level integration between our 'Event Platform' JSON data and Flink. My main goal: *No case classes or POJOs. *The JSONSchemas s

Re: Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Peter Westermann
We run a standalone Flink cluster in session mode (but we usually only run one job per cluster; session mode just fits better with our deployment workflow than application mode). We trigger hourly savepoints and also use savepoints to stop a job and then restart with a new version of the jar. I

Re: Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Chesnay Schepler
ok that shouldn't happen. I couldn't find anything wrong in the code so far; will continue trying to reproduce it. If this happens, does it persist indefinitely for a particular triggerId, or does it reappear later on again? Are you only ever triggering a single savepoint for a given job? Are

Re: Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Chesnay Schepler
Are there any log messages from the CompletedOperationCache in the logs? On 16/06/2022 16:54, Chesnay Schepler wrote: There is an expected case where this might happen: if too much time has elapsed since the savepoint was completed (default 5 minutes; controlled by rest.async.store-duration)

Re: Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Peter Westermann
If it happens it happens immediately. Once we receive the triggerId from /jobs/:jobid/stop or /jobs/:jobid/savepoints we poll /jobs/:jobid/savepoints/:triggerid every second until the status is no longer IN_PROGRESS. Peter Westermann Analytics Software Architect [cidimage001.jpg@01D78D4C.C00AC0

Re: Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Chesnay Schepler
There is an expected case where this might happen: if too much time has elapsed since the savepoint was completed (default 5 minutes; controlled by rest.async.store-duration) Did this happen earlier than that? On 16/06/2022 15:53, Peter Westermann wrote: We recently upgraded one of our Flink

Re: Flink Shaded dependencies and extending Flink APIs

2022-06-16 Thread Andrew Otto
Hi all thanks for the responses. > Create a module let's say "wikimedia-event-utilities-shaded" This actually doesn't help me, as wikimedia-event-utilities is used as an API by non Flink stuff too, so I don't want to use the shaded ObjectNode in the API params. > Another solution is that you can

Sporadic issues with savepoint status lookup in Flink 1.15

2022-06-16 Thread Peter Westermann
We recently upgraded one of our Flink clusters to version 1.15.0 and are now seeing sporadic issues when stopping a job with a savepoint via the REST API. This happens for /jobs/:jobid/savepoints and /jobs/:jobid/stop: The job finishes with a savepoint but the triggerId returned from the REST API

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
Hi Xingbo, Yes, there are a number of temporary views being created, where each is being created using SQL (CREATE TEMPORARY VIEW ...) rather than explicit calls to the Table and DataStream APIs. Is this a good pattern or are there caveats I should be aware of please? Many thanks, John

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
Hi John, Does your job logic include conversion between Table and DataStream? For example, methods such as `create_temporary_view(path: str, data_stream: DataStream): -> Table` are used. Best, Xingbo John Tipper 于2022年6月16日周四 18:31写道: > Hi Xingbo, > > I’m afraid I can’t share my code but Flin

Re: Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions

2022-06-16 Thread Yang Wang
Could you please have a try with high availability enabled[1]? If HA enabled, the internal jobmanager rpc service will not be created. Instead, the TaskManager retrieves the JobManager address via HA services and connects to it via pod ip. [1]. https://github.com/apache/flink-kubernetes-operator/

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
Hi Xingbo, I’m afraid I can’t share my code but Flink is 1.13. The main Flink code is running inside Kinesis on AWS so I cannot change the version. Many thanks, John Sent from my iPhone On 16 Jun 2022, at 10:37, Xingbo Huang wrote:  Hi John, Could you provide the code snippet and the vers

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread Xingbo Huang
Hi John, Could you provide the code snippet and the version of pyflink you used? Best, Xingbo John Tipper 于2022年6月16日周四 17:05写道: > Hi all, > > I'm trying to run a PyFlink unit test to test some PyFlink SQL and where > my code uses a Python UDF. I can't share my code but the test case is > si

The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
Hi all, I'm trying to run a PyFlink unit test to test some PyFlink SQL and where my code uses a Python UDF. I can't share my code but the test case is similar to the code here: https://github.com/apache/flink/blob/f8172cdbbc27344896d961be4b0b9cdbf000b5cd/flink-python/pyflink/testing/test_case_

Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions

2022-06-16 Thread Elisha, Moshe (Nokia - IL/Kfar Sava)
Hello, We are launching Flink deployments using the Flink Kubernetes Operator on a Kubernetes cluster with Istio and mTLS enabled. We found that the TaskManager is unable to communicate with the JobManager on the jobman