Re: Flink Batch Processing

Timo Walther Tue, 29 Sep 2020 01:11:13 -0700

Hi Sunitha,

currently, not every connector can be mixed with every API. I agree thatit is confusing from time to time. The HBase connector is anInputFormat. DataSet, DataStream and Table API can work withInputFormats. The current Hbase input format might work best with TableAPI. If you like to use CEP API, you can use Table API(StreamTableEnvironment) to read from Hbase and call `toAppendStream`directly afterwards to further process in DataStream API. This worksalso for bounded streams thus you can do "batch" processing.


Regards,
Timo


On 29.09.20 09:56, Till Rohrmann wrote:

Hi Sunitha,

here is some documentation about how to use the Hbase sink with Flink[1, 2].

[1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hbase.html[2]https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-hbase-connector.html


Cheers,
Till

On Tue, Sep 29, 2020 at 9:16 AM s_penakalap...@yahoo.com<mailto:s_penakalap...@yahoo.com> <s_penakalap...@yahoo.com<mailto:s_penakalap...@yahoo.com>> wrote:


    Hi Piotrek,

    Thank you for the reply.

    Flink changes are good, However Flink is changing so much that we
    are unable to get any good implementation examples either on Flink
    documents or any other website.

    Using HBaseInputFormat I was able to read the data as a DataSet<>,
    now I see that DataSet would be deprecated.

    In recent release Flink 1.11.1 I see Blink planner, but I was not
    able to get one example on how to connect to HBase and read data. Is
    there any link I can refer to see some implementation of reading
    from HBase as bounded data using Blink Planner/DataStream API.

    Regards,
    Sunitha.



    On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski
    <pnowoj...@apache.org <mailto:pnowoj...@apache.org>> wrote:


    Hi Sunitha,

    First and foremost, the DataSet API will be deprecated soon [1] so I
    would suggest trying to migrate to the DataStream API. When using
    the DataStream API it doesn't mean that you can not work with
    bounded inputs - you can. Flink SQL (Blink planner) is in fact using
    DataStream API to execute both streaming and batch queries. Maybe
    this path would be easier?

    And about answering your question using the DataSet API - sorry, I
    don't know it :( I will try to ping someone who could help here.

    Piotrek

    [1]
    https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741

    pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com
    <mailto:s_penakalap...@yahoo.com> <s_penakalap...@yahoo.com
    <mailto:s_penakalap...@yahoo.com>> napisał(a):

        Hi All,

        Need your help in Flink Batch processing: scenario described below:

        we have multiple vehicles, we get data from each vehicle at a
        very high speed, 1 record per minute.
        thresholds can be set by the owner for each vehicle.

        Say: we have 3 vehicles, threshold is set for 2 vehicles.
        Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
        vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
        vehicle 3  no threshold set by owner.

        All the vehicle data is stored in HBase tables. We have a
        scheduled Batch Job every day at 12 pm to check the status of
        vehicle movement and Petrol consumption against threshold and
        raise an alert (vehicle1 did not move for past 20 hours, vehicle
        2 consumed more petrol. )

        Since it is a Batch Job, I loaded all threshold data in one
        DataSet and HBase Data in another Dataset using HbaseInputFormat.

        What I am failing to figure out is:
        1> vehicle 1 is having threshold of 20 hours where as vehicle 2
        has threshold of 35 hours, I need to fetch data from Hbase for
        different scenario. Is there any better approach to get all data
        using one Hbase connection.
        2> how to apply alert on Dataset.  CEP pattern/ Match_recognize
        is allowed only on DataStream. Please help me with a simple
        example. (alert can be raised if count is zero or like petrol
        consumption is too high)


        I could not get any example for Dataset on google where an alert
        is raised. Kindly guide me if there is any better approach

        Regards,
        Sunitha.

Re: Flink Batch Processing

Reply via email to