Re: Realtime Data processing from HBase

Arvid Heise Tue, 05 Jan 2021 02:29:18 -0800

Hi Sunitha,

The current HBase connector only works continuously with Table API/SQL. If
you use the input format, it only reads the data once as you have found out.


What you can do is to implement your own source that repeatedly polls data
and uses pagination or filters to poll only new data. You would add the
last read offset to the checkpoint data of your source.

If you are using Flink 1.12, I'd strongly recommend to use the new source
interface [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/sources.html

On Mon, Dec 28, 2020 at 6:43 AM s_penakalap...@yahoo.com <
s_penakalap...@yahoo.com> wrote:

> Thanks Deepak.
>
> Does this mean Streaming from HBase is not possible using current
> Streaming API?
>
> Also request you to shred some light on HBase checkpointing. I referred
> the below URL to implement checkpointing however in the example I see count
> is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to
> checkpoint based on the data we read from HBase
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.html
>
> Regards,
> Sunitha.
>
> On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <
> deepakmc...@gmail.com> wrote:
>
>
> I would suggest another approach here.
> 1.Write a job that reads from hbase , checkpoints and pushes the data to
> broker such as Kafka.
> 2.Flink streaming job would be the second job to read for kafka and
> process data.
>
> With the separation of the concern as above , maintaining it would be
> simpler.
>
> Thanks
> Deepak
>
> On Mon, Dec 28, 2020 at 10:42 AM s_penakalap...@yahoo.com <
> s_penakalap...@yahoo.com> wrote:
>
> Hi Team,
>
> Kindly help me with some inputs.. I am using Flink 1.12.
>
> Regards,
> Sunitha.
>
> On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30,
> s_penakalap...@yahoo.com <s_penakalap...@yahoo.com> wrote:
>
>
> Hi Team,
>
> I recently encountered one usecase in my project as described below:
>
> My data source is HBase
> We receive huge volume of data at very high speed to HBase tables from
> source system.
> Need to read from HBase, perform computation and insert to postgreSQL.
>
> I would like few inputs on the below points:
>
>    - Using Flink streaming API,  is continuous streaming possible from
>    HBase Database? As I tried using RichSourceFunction 
> ,StreamExecutionEnvironment
>    and was able to read data but Job stops once all data is read from HBase.
>    My requirement is Job should be continuously executing and read data as and
>    when data arrives to HBase table.
>    - If continuous streaming from HBase is supported, How can
>    Checkpointing be done on HBase so that Job can be restarted from the
>    pointed where Job aborted. I tried googling but no luck. Request to help
>    with any simple example or approach.
>    - If continuous streaming from HBase is not supported then what should
>    be alternative approach - Batch Job?(Our requirement is to process the
>    realtime data from HBase and not to launch multiple ETL Job)
>
>
> Happy Christmas to all  :)
>
>
> Regards,
> Sunitha.
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Realtime Data processing from HBase

Reply via email to