Detecting when a job is "caught up"

2022-12-07 Thread Kurtis Walker via user
I’ve got a Flink job that uses a HybridSource. It reads a bunch of S3 files and then transitions to a Kafka Topic where it stays processing live data. Everything gets written to a warehouse where users build and run reports. In takes about 36 hours to read data from the beginning before it’s

Multiple sources ordering

2021-10-21 Thread Kurtis Walker
I have a case where my Flink job needs to consume multiple sources. I have a topic in Kafka where the order of consuming is important. Because the cost of S3 is much less than storage on Kafka, we have a job that sinks to S3. The topic in Kafka can then retain just 3 days worth of data. My

Re: Using s3 bucket for high availability

2021-06-09 Thread Kurtis Walker
Thank you, I figured it out. My IAM policy was missing some actions. Seems I needed to give it “*” for it to work. From: Tamir Sagi Date: Wednesday, June 9, 2021 at 6:02 AM To: Yang Wang , Kurtis Walker Cc: user Subject: Re: Using s3 bucket for high availability EXTERNAL EMAIL I'd try

Re: Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
specifying the region for the bucket? I’ve been searching the doc and haven’t found anything like that. From: Kurtis Walker Date: Tuesday, June 8, 2021 at 10:55 AM To: user Subject: Using s3 bucket for high availability Hello, I’m trying to set up my flink native Kubernetes cluster with High av

Using s3 bucket for high availability

2021-06-08 Thread Kurtis Walker
Hello, I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Re: Backpressure configuration

2021-04-29 Thread Kurtis Walker
for it. From: Roman Khachatryan Date: Thursday, April 29, 2021 at 2:49 PM To: Kurtis Walker Cc: user@flink.apache.org Subject: Re: Backpressure configuration EXTERNAL EMAIL Hello Kurt, Assuming that your sink is blocking, I would first make sure that it is not chained with the preceding operators

Backpressure configuration

2021-04-29 Thread Kurtis Walker
Hello, I’m building a POC for a Flink app that loads data from Kafka in to a Greenplum data warehouse. I’ve built a custom sink operation that will bulk load a number of rows. I’m using a global window, triggering on number of records, to collect the rows for each sink. I’m finding that