Re: Consuming data from dynamoDB streams to flink

Vinay Patil Thu, 08 Aug 2019 02:13:43 -0700

Hello,

For anyone looking for setting up alerts for flink application ,here is
good blog by Flink itself :
https://www.ververica.com/blog/monitoring-apache-flink-applications-101
So, for dynamoDb streams we can set the alert on millisBehindLatest


Regards,
Vinay Patil


On Wed, Aug 7, 2019 at 2:24 PM Vinay Patil <vinay18.pa...@gmail.com> wrote:

> Hi Andrey,
>
> Thank you for your reply, I understand that the checkpoints are gone when
> the job is cancelled or killed, may be configuring external checkpoints
> will help here so that we can resume from there.
>
> My points was if the job is terminated, and the stream position is set to
> TRIM_HORIZON , the consumer will start processing from the start, I was
> curious to know if there is a configuration like kafka_group_id that we can
> set for dynamoDB streams.
>
> Also, can you please let me know on which metrics should I generate an
> alert in case of DynamoDb Streams  (I am sending the metrics to
> prometheus), I see these metrics :
> https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java
>
>
> For Example: in case of Kafka we generate  alert when the consumer is
> lagging behind.
>
>
> Regards,
> Vinay Patil
>
>
> On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin <and...@ververica.com>
> wrote:
>
>> Hi Vinay,
>>
>> 1. I would assume it works similar to kinesis connector (correct me if
>> wrong, people who actually developed it)
>> 2. If you have activated just checkpointing, the checkpoints are gone if
>> you externally kill the job. You might be interested in savepoints [1]
>> 3. See paragraph in [2] about kinesis consumer parallelism
>>
>> Best,
>> Andrey
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer
>>
>> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil <vinay18.pa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am using this consumer for processing records from DynamoDb Streams ,
>>> few questions on this :
>>>
>>> 1. How does checkpointing works with Dstreams, since this class is
>>> extending FlinkKinesisConsumer, I am assuming it will start from the last
>>> successful checkpoint in case of failure, right ?
>>> 2. Currently when I kill the pipeline and start again it reads all the
>>> data from the start of the stream, is there any configuration to avoid this
>>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
>>> group-id in Kafka.
>>> 3. As these DynamoDB Streams are separated by shards what is the
>>> recommended parallelism to be set for the source , should it be one to one
>>> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>>>
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>>
>>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
>>> archive.] <ml+s1008284n23597...@n3.nabble.com> wrote:
>>>
>>>> Thank you so much Fabian!
>>>>
>>>> Will update status in the JIRA.
>>>>
>>>> -
>>>> Ying
>>>>
>>>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote:
>>>>
>>>> > Done!
>>>> >
>>>> > Thank you :-)
>>>> >
>>>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>:
>>>> >
>>>> > > Thanks Fabian and Thomas.
>>>> > >
>>>> > > Please assign FLINK-4582 to the following username:
>>>> > >     *yxu-apache
>>>> > > <
>>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>>>> > >*
>>>> > >
>>>> > > If needed I can get a ICLA or CCLA whichever is proper.
>>>> > >
>>>> > > *Ying Xu*
>>>> > > Software Engineer
>>>> > > 510.368.1252 <+15103681252>
>>>> > > [image: Lyft] <http://www.lyft.com/>
>>>> > >
>>>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote:
>>>> > >
>>>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>>>> > > >
>>>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>>
>>>> > wrote:
>>>> > > >
>>>> > > > > Hi Ying,
>>>> > > > >
>>>> > > > > Thanks for considering to contribute the connector!
>>>> > > > >
>>>> > > > > In general, you don't need special permissions to contribute to
>>>> > Flink.
>>>> > > > > Anybody can open Jiras and PRs.
>>>> > > > > You only need to be assigned to the Contributor role in Jira to
>>>> be
>>>> > able
>>>> > > > to
>>>> > > > > assign an issue to you.
>>>> > > > > I can give you these permissions if you tell me your Jira user.
>>>> > > > >
>>>> > > > > It would also be good if you could submit a CLA [1] if you plan
>>>> to
>>>> > > > > contribute a larger feature.
>>>> > > > >
>>>> > > > > Thanks, Fabian
>>>> > > > >
>>>> > > > > [1] https://www.apache.org/licenses/#clas
>>>> > > > >
>>>> > > > >
>>>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>:
>>>> > > > >
>>>> > > > > > Hello Flink dev:
>>>> > > > > >
>>>> > > > > > We have implemented the prototype design and the initial PoC
>>>> worked
>>>> > > > > pretty
>>>> > > > > > well.  Currently, we plan to move ahead with this design in
>>>> our
>>>> > > > internal
>>>> > > > > > production system.
>>>> > > > > >
>>>> > > > > > We are thinking of contributing this connector back to the
>>>> flink
>>>> > > > > community
>>>> > > > > > sometime soon.  May I request to be granted with a
>>>> contributor
>>>> > role?
>>>> > > > > >
>>>> > > > > > Many thanks in advance.
>>>> > > > > >
>>>> > > > > > *Ying Xu*
>>>> > > > > > Software Engineer
>>>> > > > > > 510.368.1252 <+15103681252>
>>>> > > > > > [image: Lyft] <http://www.lyft.com/>
>>>> > > > > >
>>>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote:
>>>> > > > > >
>>>> > > > > > > Hi Gordon:
>>>> > > > > > >
>>>> > > > > > > Cool. Thanks for the thumb-up!
>>>> > > > > > >
>>>> > > > > > > We will include some test cases around the behavior of
>>>> > re-sharding.
>>>> > > > If
>>>> > > > > > > needed we can double check the behavior with AWS, and see
>>>> if
>>>> > > > additional
>>>> > > > > > > changes are needed.  Will keep you posted.
>>>> > > > > > >
>>>> > > > > > > -
>>>> > > > > > > Ying
>>>> > > > > > >
>>>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
>>>> > > > > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=6>
>>>> > > > > > >
>>>> > > > > > > wrote:
>>>> > > > > > >
>>>> > > > > > >> Hi Ying,
>>>> > > > > > >>
>>>> > > > > > >> Sorry for the late reply here.
>>>> > > > > > >>
>>>> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
>>>> seems
>>>> > > like
>>>> > > > > > this
>>>> > > > > > >> should simply work.
>>>> > > > > > >>
>>>> > > > > > >> Regarding the resharding behaviour I mentioned in the
>>>> JIRA:
>>>> > > > > > >> I'm not sure if this is really a difference in behaviour.
>>>> > > > Internally,
>>>> > > > > if
>>>> > > > > > >> DynamoDB streams is actually just working on Kinesis
>>>> Streams,
>>>> > then
>>>> > > > the
>>>> > > > > > >> resharding primitives should be similar.
>>>> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
>>>> assumes
>>>> > > that
>>>> > > > > > >> splitting / merging shards will result in new shards of
>>>> > > increasing,
>>>> > > > > > >> consecutive shard ids. As long as this is also the
>>>> behaviour for
>>>> > > > > > DynamoDB
>>>> > > > > > >> resharding, then we should be fine.
>>>> > > > > > >>
>>>> > > > > > >> Feel free to start with the implementation for this, I
>>>> think
>>>> > > > > design-wise
>>>> > > > > > >> we're good to go. And thanks for working on this!
>>>> > > > > > >>
>>>> > > > > > >> Cheers,
>>>> > > > > > >> Gordon
>>>> > > > > > >>
>>>> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote:
>>>> > > > > > >>
>>>> > > > > > >> > HI Gordon:
>>>> > > > > > >> >
>>>> > > > > > >> > We are starting to implement some of the primitives
>>>> along this
>>>> > > > path.
>>>> > > > > > >> Please
>>>> > > > > > >> > let us know if you have any suggestions.
>>>> > > > > > >> >
>>>> > > > > > >> > Thanks!
>>>> > > > > > >> >
>>>> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden
>>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=8>>
>>>> > wrote:
>>>> > > > > > >> >
>>>> > > > > > >> > > Hi Gordon:
>>>> > > > > > >> > >
>>>> > > > > > >> > > Really appreciate the reply.
>>>> > > > > > >> > >
>>>> > > > > > >> > > Yes our plan is to build the connector on top of the
>>>> > > > > > >> > FlinkKinesisConsumer.
>>>> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly
>>>> interacts
>>>> > with
>>>> > > > > > Kinesis
>>>> > > > > > >> > > through the AmazonKinesis client, more specifically
>>>> through
>>>> > > the
>>>> > > > > > >> following
>>>> > > > > > >> > > three function calls:
>>>> > > > > > >> > >
>>>> > > > > > >> > >    - describeStream
>>>> > > > > > >> > >    - getRecords
>>>> > > > > > >> > >    - getShardIterator
>>>> > > > > > >> > >
>>>> > > > > > >> > > Given that the low-level DynamoDB client
>>>> > > > > > (AmazonDynamoDBStreamsClient)
>>>> > > > > > >> > > has already implemented similar calls, it is possible
>>>> to use
>>>> > > > that
>>>> > > > > > >> client
>>>> > > > > > >> > to
>>>> > > > > > >> > > interact with the dynamoDB streams, and adapt the
>>>> results
>>>> > from
>>>> > > > the
>>>> > > > > > >> > dynamoDB
>>>> > > > > > >> > > streams model to the kinesis model.
>>>> > > > > > >> > >
>>>> > > > > > >> > > It appears this is exactly what the
>>>> > > > AmazonDynamoDBStreamsAdapterCl
>>>> > > > > > >> ient
>>>> > > > > > >> > > <
>>>> > > > > > >> >
>>>> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>>>> > > > > > >>
>>>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>>>> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>>>> > > > > > >> > >
>>>> > > > > > >> > > does. The adaptor client implements the AmazonKinesis
>>>> client
>>>> > > > > > >> interface,
>>>> > > > > > >> > > and is officially supported by AWS.  Hence it is
>>>> possible to
>>>> > > > > replace
>>>> > > > > > >> the
>>>> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer
>>>> with
>>>> > this
>>>> > > > > > adapter
>>>> > > > > > >> > > client when interacting with dynamoDB streams.  The
>>>> new
>>>> > object
>>>> > > > can
>>>> > > > > > be
>>>> > > > > > >> a
>>>> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
>>>> > > > > > >> > FlinkDynamoStreamCon
>>>> > > > > > >> > > sumer.
>>>> > > > > > >> > >
>>>> > > > > > >> > > At best this could simply work. But we would like to
>>>> hear if
>>>> > > > there
>>>> > > > > > are
>>>> > > > > > >> > > other situations to take care of.  In particular, I am
>>>> > > wondering
>>>> > > > > > >> what's
>>>> > > > > > >> > the *"resharding
>>>> > > > > > >> > > behavior"* mentioned in FLINK-4582.
>>>> > > > > > >> > >
>>>> > > > > > >> > > Thanks a lot!
>>>> > > > > > >> > >
>>>> > > > > > >> > > -
>>>> > > > > > >> > > Ying
>>>> > > > > > >> > >
>>>> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai
>>>> <
>>>> > > > > > >> > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=9>
>>>> > > > > > >> > > > wrote:
>>>> > > > > > >> > >
>>>> > > > > > >> > >> Hi!
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> I think it would be definitely nice to have this
>>>> feature.
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> No actual previous work has been made on this issue,
>>>> but
>>>> > > AFAIK,
>>>> > > > > we
>>>> > > > > > >> > should
>>>> > > > > > >> > >> be able to build this on top of the
>>>> FlinkKinesisConsumer.
>>>> > > > > > >> > >> Whether this should live within the Kinesis connector
>>>> > module
>>>> > > or
>>>> > > > > an
>>>> > > > > > >> > >> independent module of its own is still TBD.
>>>> > > > > > >> > >> If you want, I would be happy to look at any concrete
>>>> > design
>>>> > > > > > >> proposals
>>>> > > > > > >> > you
>>>> > > > > > >> > >> have for this before you start the actual development
>>>> > > efforts.
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> Cheers,
>>>> > > > > > >> > >> Gordon
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden
>>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=10>>
>>>> > > wrote:
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> > Thanks Fabian for the suggestion.
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > *Ying Xu*
>>>> > > > > > >> > >> > Software Engineer
>>>> > > > > > >> > >> > 510.368.1252 <+15103681252>
>>>> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
>>>> > > > > > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>>
>>>> > > > > > >> > >> wrote:
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > > Hi Ying,
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > I'm not aware of any effort for this issue.
>>>> > > > > > >> > >> > > You could check with the assigned contributor in
>>>> Jira
>>>> > if
>>>> > > > > there
>>>> > > > > > is
>>>> > > > > > >> > some
>>>> > > > > > >> > >> > > previous work.
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > Best, Fabian
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>:
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > > Hello Flink dev:
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > We have a number of use cases which involves
>>>> pulling
>>>> > > data
>>>> > > > > > from
>>>> > > > > > >> > >> DynamoDB
>>>> > > > > > >> > >> > > > streams into Flink.
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
>>>> > > > > > >> > >> > > > <
>>>> https://issues.apache.org/jira/browse/FLINK-4582>.
>>>> > we
>>>> > > > > would
>>>> > > > > > >> like
>>>> > > > > > >> > >> to
>>>> > > > > > >> > >> > > check
>>>> > > > > > >> > >> > > > if any prior work has been completed by the
>>>> > community.
>>>> > > >  We
>>>> > > > > > are
>>>> > > > > > >> > also
>>>> > > > > > >> > >> > very
>>>> > > > > > >> > >> > > > interested in contributing to this effort.
>>>> > Currently,
>>>> > > we
>>>> > > > > > have
>>>> > > > > > >> a
>>>> > > > > > >> > >> > > high-level
>>>> > > > > > >> > >> > > > proposal which is based on extending the
>>>> existing
>>>> > > > > > >> > >> FlinkKinesisConsumer
>>>> > > > > > >> > >> > > and
>>>> > > > > > >> > >> > > > making it work with DynamoDB streams (via
>>>> integrating
>>>> > > > with
>>>> > > > > > the
>>>> > > > > > >> > >> > > > AmazonDynamoDBStreams API).
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > -
>>>> > > > > > >> > >> > > > Ying
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >>
>>>> > > > > > >> > >
>>>> > > > > > >> > >
>>>> > > > > > >> >
>>>> > > > > > >>
>>>> > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html
>>>> To start a new topic under Apache Flink Mailing List archive., email
>>>> ml+s1008284n1...@n3.nabble.com
>>>> To unsubscribe from Apache Flink Mailing List archive., click here
>>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
>>>> .
>>>> NAML
>>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>

Re: Consuming data from dynamoDB streams to flink

Reply via email to