Re: Proposal: Redis Stream Connector

2020-12-07 Thread Vincent Marquez
On Fri, Dec 4, 2020 at 12:28 PM Boyuan Zhang wrote: > Hi Vincent, > > 1. Just to be clear, for a streaming pipeline (say, on the dataflow >> runner)it will use the 'residual' result of the SplitRestriction >> (retrieved from trySplit) as the checkpoint, so if the pipeline is stopped >> due to an

Re: Proposal: Redis Stream Connector

2020-12-04 Thread Boyuan Zhang
Hi Vincent, 1. Just to be clear, for a streaming pipeline (say, on the dataflow > runner)it will use the 'residual' result of the SplitRestriction > (retrieved from trySplit) as the checkpoint, so if the pipeline is stopped > due to an error, then restarted with the same checkpoint it would resum

Re: Proposal: Redis Stream Connector

2020-12-04 Thread Vincent Marquez
Thank you for your help Boyuan. 1. Just to be clear, for a streaming pipeline (say, on the dataflow runner)it will use the 'residual' result of the SplitRestriction (retrieved from trySplit) as the checkpoint, so if the pipeline is stopped due to an error, then restarted with the same checkpoint

Re: Proposal: Redis Stream Connector

2020-11-30 Thread Boyuan Zhang
In Splittable DoFn, the trySplit[1] API in RestrictionTracker is for performing checkpointing, keeping primary as current restriction and returning residuals. In the DoFn, you can do Splittable DoFn initiated checkpoint by returning ProcessContinuation.resume()[2]. Beam programming guide[3] also ta

Re: Proposal: Redis Stream Connector

2020-11-29 Thread Vincent Marquez
Regarding checkpointing: I'm confused how the Splittable DoFn can make use of checkpoints to resume and not have data loss. Unlike the old API that had a very easy to understand method called 'getCheckpointMark' that allows me to return the completed work, I don't see where that is done with the

Re: Proposal: Redis Stream Connector

2020-11-26 Thread Ismaël Mejía
Just want to mention that we have been working with Vincent in the ReadAll implementation for Cassandra based on normal DoFn, and we expect to get it merged for the next release of Beam. Vincent is familiarized now with DoFn based IO composition, a first step towards SDF understanding. Vincent you

Re: Proposal: Redis Stream Connector

2020-11-19 Thread Boyuan Zhang
On Thu, Nov 19, 2020 at 1:29 PM Vincent Marquez wrote: > > > > On Thu, Nov 19, 2020 at 10:38 AM Boyuan Zhang wrote: > >> Hi Vincent, >> >> Thanks for your contribution! I'm happy to work with you on this when you >> contribute the code into Beam. >> > > Should I write up a JIRA to start? I have

Re: Proposal: Redis Stream Connector

2020-11-19 Thread Vincent Marquez
On Thu, Nov 19, 2020 at 10:38 AM Boyuan Zhang wrote: > Hi Vincent, > > Thanks for your contribution! I'm happy to work with you on this when you > contribute the code into Beam. > Should I write up a JIRA to start? I have access, I've already been in the process of contributing some big changes

Re: Proposal: Redis Stream Connector

2020-11-19 Thread Boyuan Zhang
Hi Vincent, Thanks for your contribution! I'm happy to work with you on this when you contribute the code into Beam. Another thing is that it would be preferable to use Splittable DoFn instead of using UnboundedSource to

Proposal: Redis Stream Connector

2020-11-19 Thread Vincent Marquez
Currently, Redis offers a streaming queue functionality similar to Kafka/Kinesis/Pubsub, but Beam does not currently have a connector for it. I've written an UnboundedSource connector that makes use of Redis Streams as a POC and it seems to work well. If someone is willing to work with me, I coul