Hello, For anyone looking for setting up alerts for flink application ,here is good blog by Flink itself : https://www.ververica.com/blog/monitoring-apache-flink-applications-101 So, for dynamoDb streams we can set the alert on millisBehindLatest
Regards, Vinay Patil On Wed, Aug 7, 2019 at 2:24 PM Vinay Patil <vinay18.pa...@gmail.com> wrote: > Hi Andrey, > > Thank you for your reply, I understand that the checkpoints are gone when > the job is cancelled or killed, may be configuring external checkpoints > will help here so that we can resume from there. > > My points was if the job is terminated, and the stream position is set to > TRIM_HORIZON , the consumer will start processing from the start, I was > curious to know if there is a configuration like kafka_group_id that we can > set for dynamoDB streams. > > Also, can you please let me know on which metrics should I generate an > alert in case of DynamoDb Streams (I am sending the metrics to > prometheus), I see these metrics : > https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java > > > For Example: in case of Kafka we generate alert when the consumer is > lagging behind. > > > Regards, > Vinay Patil > > > On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin <and...@ververica.com> > wrote: > >> Hi Vinay, >> >> 1. I would assume it works similar to kinesis connector (correct me if >> wrong, people who actually developed it) >> 2. If you have activated just checkpointing, the checkpoints are gone if >> you externally kill the job. You might be interested in savepoints [1] >> 3. See paragraph in [2] about kinesis consumer parallelism >> >> Best, >> Andrey >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html >> [2] >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer >> >> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil <vinay18.pa...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am using this consumer for processing records from DynamoDb Streams , >>> few questions on this : >>> >>> 1. How does checkpointing works with Dstreams, since this class is >>> extending FlinkKinesisConsumer, I am assuming it will start from the last >>> successful checkpoint in case of failure, right ? >>> 2. Currently when I kill the pipeline and start again it reads all the >>> data from the start of the stream, is there any configuration to avoid this >>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to >>> group-id in Kafka. >>> 3. As these DynamoDB Streams are separated by shards what is the >>> recommended parallelism to be set for the source , should it be one to one >>> mapping , for example if there are 3 shards , then parallelism should be 3 ? >>> >>> >>> Regards, >>> Vinay Patil >>> >>> >>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List >>> archive.] <ml+s1008284n23597...@n3.nabble.com> wrote: >>> >>>> Thank you so much Fabian! >>>> >>>> Will update status in the JIRA. >>>> >>>> - >>>> Ying >>>> >>>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote: >>>> >>>> > Done! >>>> > >>>> > Thank you :-) >>>> > >>>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>: >>>> > >>>> > > Thanks Fabian and Thomas. >>>> > > >>>> > > Please assign FLINK-4582 to the following username: >>>> > > *yxu-apache >>>> > > < >>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache >>>> > >* >>>> > > >>>> > > If needed I can get a ICLA or CCLA whichever is proper. >>>> > > >>>> > > *Ying Xu* >>>> > > Software Engineer >>>> > > 510.368.1252 <+15103681252> >>>> > > [image: Lyft] <http://www.lyft.com/> >>>> > > >>>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote: >>>> > > >>>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well. >>>> > > > >>>> > > > https://issues.apache.org/jira/browse/FLINK-4582 >>>> > > > >>>> > > > >>>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>> >>>> > wrote: >>>> > > > >>>> > > > > Hi Ying, >>>> > > > > >>>> > > > > Thanks for considering to contribute the connector! >>>> > > > > >>>> > > > > In general, you don't need special permissions to contribute to >>>> > Flink. >>>> > > > > Anybody can open Jiras and PRs. >>>> > > > > You only need to be assigned to the Contributor role in Jira to >>>> be >>>> > able >>>> > > > to >>>> > > > > assign an issue to you. >>>> > > > > I can give you these permissions if you tell me your Jira user. >>>> > > > > >>>> > > > > It would also be good if you could submit a CLA [1] if you plan >>>> to >>>> > > > > contribute a larger feature. >>>> > > > > >>>> > > > > Thanks, Fabian >>>> > > > > >>>> > > > > [1] https://www.apache.org/licenses/#clas >>>> > > > > >>>> > > > > >>>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>: >>>> > > > > >>>> > > > > > Hello Flink dev: >>>> > > > > > >>>> > > > > > We have implemented the prototype design and the initial PoC >>>> worked >>>> > > > > pretty >>>> > > > > > well. Currently, we plan to move ahead with this design in >>>> our >>>> > > > internal >>>> > > > > > production system. >>>> > > > > > >>>> > > > > > We are thinking of contributing this connector back to the >>>> flink >>>> > > > > community >>>> > > > > > sometime soon. May I request to be granted with a >>>> contributor >>>> > role? >>>> > > > > > >>>> > > > > > Many thanks in advance. >>>> > > > > > >>>> > > > > > *Ying Xu* >>>> > > > > > Software Engineer >>>> > > > > > 510.368.1252 <+15103681252> >>>> > > > > > [image: Lyft] <http://www.lyft.com/> >>>> > > > > > >>>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote: >>>> > > > > > >>>> > > > > > > Hi Gordon: >>>> > > > > > > >>>> > > > > > > Cool. Thanks for the thumb-up! >>>> > > > > > > >>>> > > > > > > We will include some test cases around the behavior of >>>> > re-sharding. >>>> > > > If >>>> > > > > > > needed we can double check the behavior with AWS, and see >>>> if >>>> > > > additional >>>> > > > > > > changes are needed. Will keep you posted. >>>> > > > > > > >>>> > > > > > > - >>>> > > > > > > Ying >>>> > > > > > > >>>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai < >>>> > > > > [hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=6> >>>> > > > > > > >>>> > > > > > > wrote: >>>> > > > > > > >>>> > > > > > >> Hi Ying, >>>> > > > > > >> >>>> > > > > > >> Sorry for the late reply here. >>>> > > > > > >> >>>> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it >>>> seems >>>> > > like >>>> > > > > > this >>>> > > > > > >> should simply work. >>>> > > > > > >> >>>> > > > > > >> Regarding the resharding behaviour I mentioned in the >>>> JIRA: >>>> > > > > > >> I'm not sure if this is really a difference in behaviour. >>>> > > > Internally, >>>> > > > > if >>>> > > > > > >> DynamoDB streams is actually just working on Kinesis >>>> Streams, >>>> > then >>>> > > > the >>>> > > > > > >> resharding primitives should be similar. >>>> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer >>>> assumes >>>> > > that >>>> > > > > > >> splitting / merging shards will result in new shards of >>>> > > increasing, >>>> > > > > > >> consecutive shard ids. As long as this is also the >>>> behaviour for >>>> > > > > > DynamoDB >>>> > > > > > >> resharding, then we should be fine. >>>> > > > > > >> >>>> > > > > > >> Feel free to start with the implementation for this, I >>>> think >>>> > > > > design-wise >>>> > > > > > >> we're good to go. And thanks for working on this! >>>> > > > > > >> >>>> > > > > > >> Cheers, >>>> > > > > > >> Gordon >>>> > > > > > >> >>>> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote: >>>> > > > > > >> >>>> > > > > > >> > HI Gordon: >>>> > > > > > >> > >>>> > > > > > >> > We are starting to implement some of the primitives >>>> along this >>>> > > > path. >>>> > > > > > >> Please >>>> > > > > > >> > let us know if you have any suggestions. >>>> > > > > > >> > >>>> > > > > > >> > Thanks! >>>> > > > > > >> > >>>> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden >>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=8>> >>>> > wrote: >>>> > > > > > >> > >>>> > > > > > >> > > Hi Gordon: >>>> > > > > > >> > > >>>> > > > > > >> > > Really appreciate the reply. >>>> > > > > > >> > > >>>> > > > > > >> > > Yes our plan is to build the connector on top of the >>>> > > > > > >> > FlinkKinesisConsumer. >>>> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly >>>> interacts >>>> > with >>>> > > > > > Kinesis >>>> > > > > > >> > > through the AmazonKinesis client, more specifically >>>> through >>>> > > the >>>> > > > > > >> following >>>> > > > > > >> > > three function calls: >>>> > > > > > >> > > >>>> > > > > > >> > > - describeStream >>>> > > > > > >> > > - getRecords >>>> > > > > > >> > > - getShardIterator >>>> > > > > > >> > > >>>> > > > > > >> > > Given that the low-level DynamoDB client >>>> > > > > > (AmazonDynamoDBStreamsClient) >>>> > > > > > >> > > has already implemented similar calls, it is possible >>>> to use >>>> > > > that >>>> > > > > > >> client >>>> > > > > > >> > to >>>> > > > > > >> > > interact with the dynamoDB streams, and adapt the >>>> results >>>> > from >>>> > > > the >>>> > > > > > >> > dynamoDB >>>> > > > > > >> > > streams model to the kinesis model. >>>> > > > > > >> > > >>>> > > > > > >> > > It appears this is exactly what the >>>> > > > AmazonDynamoDBStreamsAdapterCl >>>> > > > > > >> ient >>>> > > > > > >> > > < >>>> > > > > > >> > >>>> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/ >>>> > > > > > >> >>>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/ >>>> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java >>>> > > > > > >> > > >>>> > > > > > >> > > does. The adaptor client implements the AmazonKinesis >>>> client >>>> > > > > > >> interface, >>>> > > > > > >> > > and is officially supported by AWS. Hence it is >>>> possible to >>>> > > > > replace >>>> > > > > > >> the >>>> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer >>>> with >>>> > this >>>> > > > > > adapter >>>> > > > > > >> > > client when interacting with dynamoDB streams. The >>>> new >>>> > object >>>> > > > can >>>> > > > > > be >>>> > > > > > >> a >>>> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g, >>>> > > > > > >> > FlinkDynamoStreamCon >>>> > > > > > >> > > sumer. >>>> > > > > > >> > > >>>> > > > > > >> > > At best this could simply work. But we would like to >>>> hear if >>>> > > > there >>>> > > > > > are >>>> > > > > > >> > > other situations to take care of. In particular, I am >>>> > > wondering >>>> > > > > > >> what's >>>> > > > > > >> > the *"resharding >>>> > > > > > >> > > behavior"* mentioned in FLINK-4582. >>>> > > > > > >> > > >>>> > > > > > >> > > Thanks a lot! >>>> > > > > > >> > > >>>> > > > > > >> > > - >>>> > > > > > >> > > Ying >>>> > > > > > >> > > >>>> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai >>>> < >>>> > > > > > >> > [hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=9> >>>> > > > > > >> > > > wrote: >>>> > > > > > >> > > >>>> > > > > > >> > >> Hi! >>>> > > > > > >> > >> >>>> > > > > > >> > >> I think it would be definitely nice to have this >>>> feature. >>>> > > > > > >> > >> >>>> > > > > > >> > >> No actual previous work has been made on this issue, >>>> but >>>> > > AFAIK, >>>> > > > > we >>>> > > > > > >> > should >>>> > > > > > >> > >> be able to build this on top of the >>>> FlinkKinesisConsumer. >>>> > > > > > >> > >> Whether this should live within the Kinesis connector >>>> > module >>>> > > or >>>> > > > > an >>>> > > > > > >> > >> independent module of its own is still TBD. >>>> > > > > > >> > >> If you want, I would be happy to look at any concrete >>>> > design >>>> > > > > > >> proposals >>>> > > > > > >> > you >>>> > > > > > >> > >> have for this before you start the actual development >>>> > > efforts. >>>> > > > > > >> > >> >>>> > > > > > >> > >> Cheers, >>>> > > > > > >> > >> Gordon >>>> > > > > > >> > >> >>>> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden >>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=10>> >>>> > > wrote: >>>> > > > > > >> > >> >>>> > > > > > >> > >> > Thanks Fabian for the suggestion. >>>> > > > > > >> > >> > >>>> > > > > > >> > >> > *Ying Xu* >>>> > > > > > >> > >> > Software Engineer >>>> > > > > > >> > >> > 510.368.1252 <+15103681252> >>>> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/> >>>> > > > > > >> > >> > >>>> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske < >>>> > > > > > [hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>> >>>> > > > > > >> > >> wrote: >>>> > > > > > >> > >> > >>>> > > > > > >> > >> > > Hi Ying, >>>> > > > > > >> > >> > > >>>> > > > > > >> > >> > > I'm not aware of any effort for this issue. >>>> > > > > > >> > >> > > You could check with the assigned contributor in >>>> Jira >>>> > if >>>> > > > > there >>>> > > > > > is >>>> > > > > > >> > some >>>> > > > > > >> > >> > > previous work. >>>> > > > > > >> > >> > > >>>> > > > > > >> > >> > > Best, Fabian >>>> > > > > > >> > >> > > >>>> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>: >>>> > > > > > >> > >> > > >>>> > > > > > >> > >> > > > Hello Flink dev: >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > > We have a number of use cases which involves >>>> pulling >>>> > > data >>>> > > > > > from >>>> > > > > > >> > >> DynamoDB >>>> > > > > > >> > >> > > > streams into Flink. >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582 >>>> > > > > > >> > >> > > > < >>>> https://issues.apache.org/jira/browse/FLINK-4582>. >>>> > we >>>> > > > > would >>>> > > > > > >> like >>>> > > > > > >> > >> to >>>> > > > > > >> > >> > > check >>>> > > > > > >> > >> > > > if any prior work has been completed by the >>>> > community. >>>> > > > We >>>> > > > > > are >>>> > > > > > >> > also >>>> > > > > > >> > >> > very >>>> > > > > > >> > >> > > > interested in contributing to this effort. >>>> > Currently, >>>> > > we >>>> > > > > > have >>>> > > > > > >> a >>>> > > > > > >> > >> > > high-level >>>> > > > > > >> > >> > > > proposal which is based on extending the >>>> existing >>>> > > > > > >> > >> FlinkKinesisConsumer >>>> > > > > > >> > >> > > and >>>> > > > > > >> > >> > > > making it work with DynamoDB streams (via >>>> integrating >>>> > > > with >>>> > > > > > the >>>> > > > > > >> > >> > > > AmazonDynamoDBStreams API). >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much. >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > > - >>>> > > > > > >> > >> > > > Ying >>>> > > > > > >> > >> > > > >>>> > > > > > >> > >> > > >>>> > > > > > >> > >> > >>>> > > > > > >> > >> >>>> > > > > > >> > > >>>> > > > > > >> > > >>>> > > > > > >> > >>>> > > > > > >> >>>> > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> >>>> >>>> ------------------------------ >>>> If you reply to this email, your message will be added to the >>>> discussion below: >>>> >>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html >>>> To start a new topic under Apache Flink Mailing List archive., email >>>> ml+s1008284n1...@n3.nabble.com >>>> To unsubscribe from Apache Flink Mailing List archive., click here >>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx> >>>> . >>>> NAML >>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>>> >>>