Re: Issue with growing state/checkpoint size

2023-09-06 Thread Wiśniowski Piotr
Hi, Sorry for the late response - was busy with pretty similar problem. First let me clarify on the pipeline You use first. You have a pipeline with a step produces two stream outputs. They do flow thru some computation independently and then You join them by CoGroupByKey with session window

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Byron Ellis via user
Depends on why you're using a fan-out approach in the first place. You might actually be better off doing all the work at the same time. On Fri, Sep 1, 2023 at 6:43 AM Ruben Vargas wrote: > Ohh I see > > That makes sense. Wondering if there is an strategy for my use case, where > I have an ID

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Ruben Vargas
Ohh I see That makes sense. Wondering if there is an strategy for my use case, where I have an ID unique per pair of messages Thanks for all your help! On Fri, Sep 1, 2023 at 6:51 AM Sachin Mittal wrote: > Yes a very high and non deterministic cardinality can make the stored > state of join

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Sachin Mittal
Yes a very high and non deterministic cardinality can make the stored state of join operation unbounded. In my case we know the cardinality and it was not very high so we could go with a lookup based approach using redis to enrich the stream and avoid joins. On Wed, Aug 30, 2023 at 5:04 AM

Re: Issue with growing state/checkpoint size

2023-08-29 Thread Ruben Vargas
Thanks for the reply and the advice One more thing, Do you know if the key-space carnality impacts on this? I'm assuming it is, but the thing is for my case all the messages from the sources has a unique ID, that makes my key-space huge and is not on my control . On Tue, Aug 29, 2023 at 9:29 AM

Re: Issue with growing state/checkpoint size

2023-08-29 Thread Sachin Mittal
So for the smaller size of collection which does not grow with size for certain keys we stored the data in redis and instead of beam join in our DoFn we just did the lookup and got the data we need. On Tue, 29 Aug 2023 at 8:50 PM, Ruben Vargas wrote: > Hello, > > Thanks for the reply, Any

Re: Issue with growing state/checkpoint size

2023-08-29 Thread Ruben Vargas
Hello, Thanks for the reply, Any strategy you followed to avoid joins when you rewrite your pipeline? On Tue, Aug 29, 2023 at 9:15 AM Sachin Mittal wrote: > Yes even we faced the same issue when trying to run a pipeline involving > join of two collections. It was deployed using AWS KDA,

Re: Issue with growing state/checkpoint size

2023-08-29 Thread Sachin Mittal
Yes even we faced the same issue when trying to run a pipeline involving join of two collections. It was deployed using AWS KDA, which uses flink runner. The source was kinesis streams. Looks like join operations are not very efficient in terms of size management when run on flink. We had to

Issue with growing state/checkpoint size

2023-08-29 Thread Ruben Vargas
Hello I experimenting an issue with my beam pipeline I have a pipeline in which I split the work into different branches, then I do a join using CoGroupByKey, each message has its own unique Key. For the Join, I used a Session Window, and discarding the messages after trigger. I'm using Flink