Re: How is proctime represented?

2021-02-18 Thread Rex Fenley
Reading the documentation you posted again after posting this question, it does sound like it's simply a placeholder that only gets filled in when used by an operator, then again, that's still not exactly what it says so I only feel 70% confident like that's what is happening. On Thu, Feb 18, 2021

How to use ProcessWindowFunction in pyflink?

2021-02-18 Thread Hongyuan Ma
Greetings, I am a newbie to pyflink. I want to be able to use processWindowFunction in a Tumble Window, and finally output 0 or more lines. I have checked the datastreamAPI and TableAPI of pyflink, but have not found a complete example. pyflink's datastream API does not seem to implement windo

Re: How is proctime represented?

2021-02-18 Thread Chesnay Schepler
Could you check whether this answers your question? https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/timely-stream-processing.html#notions-of-time-event-time-and-processing-time On 2/19/2021 7:29 AM, Rex Fenley wrote: Hello, When using PROCTIME() in CREATE DDL for a source

Re: How do I increase number of db connections of the Flink JDBC Connector?

2021-02-18 Thread Chesnay Schepler
Every works uses exactly 1 connection, so in order to increase the number of connections you must indeed increase the worker parallelism. On 2/19/2021 6:51 AM, Li Peng wrote: Hey folks, I'm trying to use flink to write high throughput incoming data to a SQL db using the JDBC Connector as desc

Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler
No, Job-/TaskManager metrics cannot be tagged with the job name. The reason is that this only makes sense for application clusters (opposed to session clusters), but we don't differentiate between the two when it comes to metrics. On 2/19/2021 3:59 AM, bat man wrote: I meant the Flink jobname

How is proctime represented?

2021-02-18 Thread Rex Fenley
Hello, When using PROCTIME() in CREATE DDL for a source, is the proctime attribute a timestamp generated at the time of row ingestion at the source and then forwarded through the graph execution, or is proctime attribute a placeholder that says "fill me in with a timestamp" once it's being used di

How do I increase number of db connections of the Flink JDBC Connector?

2021-02-18 Thread Li Peng
Hey folks, I'm trying to use flink to write high throughput incoming data to a SQL db using the JDBC Connector as described here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/jdbc.html However, after enabling this, my data consumption rate slowed down to a crawl. After d

Re: Tag flink metrics to job name

2021-02-18 Thread bat man
I meant the Flink jobname. I’m using the below reporter - metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter Is there any way to tag job names to the task and job manager metrics. Thanks, Hemant On Fri, 19 Feb 2021 at 12:40 AM, Chesnay Schepler wrote: > When

java.io.IOException: Could not create storage directory for BLOB store in '/tmp'

2021-02-18 Thread wheatdog liou
Hi, I am new to Flink and was following Flink with docker-compose and encounter this error. I used the session-cluster docker-compose.yml template from the documen

Re: Best practices around checkpoint intervals and sizes?

2021-02-18 Thread Chesnay Schepler
A lower checkpoint interval (== more checkpoints / time) will consume more resources and hence can affect the job performance. It ultimately boils down to how much latency you are willing to accept when a failure occurs and data has to be re-processed (more checkpoints => less data). How long

Re: latency related to the checkpointing mode EXACTLY ONCE

2021-02-18 Thread Chesnay Schepler
Yes, if you are only reading committed data than it will take least the checkpoint interval for the data to be available to downstream consumers. On 2/18/2021 6:17 PM, Tan, Min wrote: Hi, We use the checkpointing mode EXACTLY ONCE for some of our flink jobs. I wonder how the chec

Re: Sharding of Operators

2021-02-18 Thread Chesnay Schepler
When you change the parallelism then keys are re-distributed across operators instances. /However/, this re-distribution is limited to the set /maxParallelism /(set via the ExecutionConfig), which by default is 128 if no operators exceeded the parallelism on the first submission. This *cannot

Re: Tag flink metrics to job name

2021-02-18 Thread Chesnay Schepler
When you mean "job_name", are you referring to the Prometheus concept of jobs, of the one of Flink? Which of Flink prometheus reporters are you using? On 2/17/2021 7:37 PM, bat man wrote: Hello there, I am using prometheus to push metrics to prometheus and then use grafana for visualization.

latency related to the checkpointing mode EXACTLY ONCE

2021-02-18 Thread Tan, Min
Hi, We use the checkpointing mode EXACTLY ONCE for some of our flink jobs. I wonder how the checkpoint configurations specially its checkpoint interval are related to the end to end latency. We need to setup read_commit true for the kafak consumers. Does this lead a latency from one flink job

Re: [Flink SQL] FLOOR(timestamp TO WEEK) not working

2021-02-18 Thread Timo Walther
Hi Sebastián, which Flink version are you using? And which precision do the timestamps have? This looks clearly like a bug to me. We should open an issue in JIRA. Regards, Timo On 18.02.21 16:17, Sebastián Magrí wrote: While using said function in a query I'm getting a query compilation err

Re: Compile time checking of SQL

2021-02-18 Thread Timo Walther
Hi Sebastián, what do you consider as compile time? If you mean some kind of SQL editor, you could take a look at Ververica platform (the community edition is free): https://www.ververica.com/blog/data-pipelines-with-flink-sql-on-ververica-platform Otherwise Flink SQL is always validated at

[Flink SQL] FLOOR(timestamp TO WEEK) not working

2021-02-18 Thread Sebastián Magrí
While using said function in a query I'm getting a query compilation error saying that there's no applicable method for the given arguments. The parameter types displayed in the error are org.apache.flink.table.data.TimestampData, org.apache.flink.table.data.TimestampData And there's no overload

Re: Using INTERVAL parameters for UDTF

2021-02-18 Thread Patrick Angeles
Opened an issue here: https://issues.apache.org/jira/browse/CALCITE-4504 Cheers On Thu, Feb 18, 2021 at 5:33 AM Timo Walther wrote: > Hi Patrick, > > thanks for reaching out to us and investigating the problem. Could you > open an issue in the Calcite project? I think it would be nice to solve

Re: Flink’s Kubernetes HA services - NOT working

2021-02-18 Thread Till Rohrmann
Hi Omer, could you share a bit more of the logs with us? I would be interested in what has happened before "Stopping DefaultLeaderRetrievalService" is logged. One problem you might run into is FLINK-20417. This problem should be fixed with Flink 1.12.2. [1] https://issues.apache.org/jira/browse/F

Compile time checking of SQL

2021-02-18 Thread Sebastián Magrí
Is there any way to check SQL strings in compile time? -- Sebastián Ramírez Magrí

Re: Using INTERVAL parameters for UDTF

2021-02-18 Thread Timo Walther
Hi Patrick, thanks for reaching out to us and investigating the problem. Could you open an issue in the Calcite project? I think it would be nice to solve it on both the Calcite and Flink side. Thanks, Timo On 18.02.21 06:02, Patrick Angeles wrote: NVM. Found the actual source on Calcite tr

Re: Which type serializer is being used?

2021-02-18 Thread David Anderson
You can use TypeInformation#of(MyClass.class).createSerializer() to determine which serializer Flink will use for a given type. Best, David On Wed, Feb 17, 2021 at 7:12 PM Sudharsan R wrote: > Hi, > I would like to find out what types are being serialized with which > serializer. Is there an ea

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-18 Thread Jan Oelschlegel
By using the DataStream API with the same business logic I'm getting no dropped events. Von: Jan Oelschlegel Gesendet: Mittwoch, 17. Februar 2021 19:18 An: user Betreff: Kafka SQL Connector: dropping events if more partitions then source tasks Hi, i have a question regarding FlinkSQL connec