Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Reuven Lax via dev
PCollections's usually are persistent within a pipeline, so you can reuse them in other parts of a pipeline with no problem. There is no notion of state across pipelines - every pipeline is independent. If you want state across pipelines you can write the PCollection out to a set of files which

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Ravi Kapoor
On Wed, Oct 19, 2022 at 2:43 PM Ravi Kapoor wrote: > I am talking about in batch context. Can we do checkpointing in batch mode > as well? > I am *not* looking for any failure or retry algorithm. > The requirement is to simply materialize a PCollection which can be used > across the jobs /within

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Ravi Kapoor
I am talking about in batch context. Can we do checkpointing in batch mode as well? I am looking for any failure or retry algorithm. The requirement is to simply materialize a PCollection which can be used across the jobs /within the job in some view/temp table which is auto deleted I believe

Re: Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Israel Herraiz via dev
I think that would be a Reshuffle , but only within the context of the same job (e.g. if there is a failure and a retry, the retry would start from the checkpoint created by the reshuffle). In Dataflow,

Staging a PCollection in Beam | Dataflow Runner

2022-10-19 Thread Ravi Kapoor
Hi Team, Can we stage a PCollection or PCollection data? Lets say to save the expensive operations between two complex BQ tables time and again and materialize it in some temp view which will be deleted after the session. Is it possible to do that in the Beam Pipeline? We can later use the temp