Re: Question About Histograms

2022-04-04 Thread Prasanna kumar
Anil, Flink Histograms are actually summaries .. You need to override the Prometheus Histogram class provided to write it into different buckets to Prometheus .. Then you can write prom queries to calculate different quantiles accordingly ... Checkpointing The histograms is not a recommended opti

Re: Flink SQL and data shuffling (keyBy)

2022-04-04 Thread Yaroslav Tkachenko
Hi Marios, Thank you, this looks very promising! On Mon, Apr 4, 2022 at 2:42 AM Marios Trivyzas wrote: > Hi again, > > Maybe you can use the > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle > *table.exec.sink.keyed-shuffle* and set it t

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Ryan van Huuksloot
Sorry for the hassle. I ended up working with a colleague and found that the Kafka Source had a single partition but the pipeline had a parallelism of 4 and there was no withIdleness associated so the Watermark was set on the output but didn't persist to the operator. Appreciate you both taking ti

WatermarkStrategy for IngestionTime

2022-04-04 Thread Xinbin Huang
Hi, Since *TimeCharacteristic,* is deprecated. AFAIK, - TimeCharacteristic*.*ProcessingTime -> WatermarkStrategy.noWatermarks() - TimeCharacteristic*.*EventTime -> WatermarkStrategy.forBoundedOutOfOrderness() - TimeCharacteristic*.*IngestionTime -> ??? Do we have a built-in *WatermarkStrategy *e

Local recovery with 1.15 snapshot

2022-04-04 Thread Abdullah Alkawatrah
Hey, Local recovery introduced in 1.15 seems like a great feature. Unfortunately, I have not been able to make it work. I am trying this with a streaming pipeline that consumes events from kafka topics, and uses rockdb for stateful operations. *Setup*: - Job manager: k8s deployment (ZK HA)

Question About Histograms

2022-04-04 Thread Anil K
Hi, I was doing some experimentation using Histograms, had a few questions mostly related to fault tolerance and restarts. I am looking for a way to calculate p95 over 30days. Since histograms are pushed as a summary into prometheus, will not be able to do the aggregation for 30 days at Prometheus'

Re: Source with S3 bucket with millions ( billions ) of object ( small files )

2022-04-04 Thread Vishal Santoshi
Thanks for the clarification. My experiments have been in line with what you have suggested. Regards. On Mon, Apr 4, 2022 at 5:30 AM Arvid Heise wrote: > Hi Vishal, > > with readFile, files are first collected and then sorted [1]. The same is > true for the new FileSource. Here, you could

Re: Elasticsearch SQL connector with SSL

2022-04-04 Thread Yuheng Zhong
Hi Marios, Ravi, Thanks for the suggestions! On Mon, Apr 4, 2022 at 5:00 AM ☼ R Nair wrote: > Why bother repeating effort and code? > > https://prestodb.io/docs/current/connector/elasticsearch.html > > Best, > Ravi > > On Mon, 4 Apr 2022 at 04:42, Marios Trivyzas wrote: > >> >> Hi! >> >> I don

HOP_PROCTIME is returning null

2022-04-04 Thread Surendra Lalwani
Hi Team, HOP_PROCTIME in flink version 1.13.6 is returning null while in previous versions it used to output a time attribute, any idea why this behaviour is observed? If anybody has any alternative, it will be highly appreciable. Example: HOP_PROCTIME(PROCTIME() , INTERVAL '30' SECOND, INTERVAL

Re: Why discard checkpoints on the job finished

2022-04-04 Thread Arvid Heise
Hi Wu, The basic idea of checkpoints is that they are fully managed by Flink and only used for fault tolerance. Hence, if a job is stopped, there should not be any checkpoint lingering around. Savepoints on the other hand or snapshots that are managed by the user, and can be used to start a new jo

Re: [DataStream API] Watermarks not closing TumblingEventTimeWindow with Reduce function

2022-04-04 Thread Arvid Heise
This sounds like a bug indeed. Could you please create a ticket with a minimal test case? The workaround is probably to use #aggregate but it should be fixed nonetheless. On Fri, Apr 1, 2022 at 9:58 AM r pp wrote: > hi~ Can you send your full code ? > > Ryan van Huuksloot 于2022年3月31日周四 22:58写道

[ANNOUNCE] Apache Flink Kubernetes Operator 0.1.0 released

2022-04-04 Thread Gyula Fóra
The Apache Flink community is very happy to announce the release of Apache Flink Kubernetes Operator 0.1.0. The Flink Kubernetes Operator allows users to manage their Apache Flink applications and their lifecycle through native k8s tooling like kubectl. This is the first release of this new commun

temporal join against versioned table

2022-04-04 Thread Fruzsina Piroska Nagy
Hi, I have a question about event time temporal joins against versioned tables. I am going through this trainging: https://github.com/ververica/sql-training/wiki/Introduction-to-SQL-on-Flink without the Flink SQL Clie

Re: Flink SQL and data shuffling (keyBy)

2022-04-04 Thread Marios Trivyzas
Hi again, Maybe you can use the https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-sink-keyed-shuffle *table.exec.sink.keyed-shuffle* and set it to *FORCE, *which will use the primary key column(s) to partition and distribute the data. On Fri, Apr 1, 2022 at 6:

Re: Source with S3 bucket with millions ( billions ) of object ( small files )

2022-04-04 Thread Arvid Heise
Hi Vishal, with readFile, files are first collected and then sorted [1]. The same is true for the new FileSource. Here, you could plugin your own Enumerator to output files in chunks but then you need to continuously pull more and can't use batch mode. We are happy to receive any patch for that b

Re: Elasticsearch SQL connector with SSL

2022-04-04 Thread ☼ R Nair
Why bother repeating effort and code? https://prestodb.io/docs/current/connector/elasticsearch.html Best, Ravi On Mon, 4 Apr 2022 at 04:42, Marios Trivyzas wrote: > > Hi! > > I don't think you can do it "out-of-the-box" > You can check an elasticsearch example: > https://github.com/elastic/ela

Re: Elasticsearch SQL connector with SSL

2022-04-04 Thread Marios Trivyzas
Hi! I don't think you can do it "out-of-the-box" You can check an elasticsearch example: https://github.com/elastic/elasticsearch/issues/29799 and then you'd probably need to adjust the code at https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch7/src/main/jav

Submit jobs via Rest API and deploy Flink on a running Kubernetes cluster (Native way)

2022-04-04 Thread Burcu Gul POLAT EGRI
Dear all, I am trying to implement a Rest client for Flink to send jobs via Restful Flink services. And also I want to integrate Flink and Kubernetes natively. I have decided to use "Application Mode" as deployment mode according to Flink documentation . I have already implemented a job and pac

Re: Source with S3 bucket with millions ( billions ) of object ( small files )

2022-04-04 Thread Roman Grebennikov
Hi, in a unified stream/batch FileSource there is a processStaticFileSet() method to enumerate all the splits only once, and make Source complete when it's finished. As for my own experience using the processStaticFileSet with large s3 buckets, the enumeration seems to happen on the jobmanager

Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-04 Thread Arvid Heise
You should create a histogram over the keys of the records. If you see a skew, one way to go about it is to refine the key or split aggregations. For example, consider you want to count events per users and 2 users are actually bots spamming lots of events accounting for 50% of all events. Then, y

Re: Unbalanced distribution of keyed stream to downstream parallel operators

2022-04-04 Thread Isidoros Ioannou
Hello Qingsheng, thank you a lot for your answer. I will try to modify the key as you mentioned in your first assumption. In case the second assumption is valid also, what would you propose to remedy the situation? Try to experiment with different values of max parallelism? Στις Σάβ 2 Απρ 2022