Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

Gibson Wed, 17 Aug 2022 08:25:58 -0700

The idea behind spark-streaming is to process change events as they occur,
hence the suggestions above that require capturing change events using
Debezium.


But you can use jdbc drivers to connect Spark to relational databases


On Wed, Aug 17, 2022 at 6:21 PM Akash Vellukai <akashvellukai...@gmail.com>
wrote:

> I am beginner with spark may , also know how to connect MySQL database
> with spark streaming
>
> Thanks and regards
> Akash P
>
> On Wed, 17 Aug, 2022, 8:28 pm Saurabh Gulati, <saurabh.gul...@fedex.com>
> wrote:
>
>> Another take:
>>
>>    - Debezium
>>    <https://debezium.io/documentation/reference/stable/connectors/mysql.html>
>>    to read Write Ahead logs(WAL) and send to Kafka
>>    - Kafka connect to write to cloud storage -> Hive
>>       - OR
>>
>>
>>    - Spark streaming to parse WAL -> Storage -> Hive
>>
>> Regards
>> ------------------------------
>> *From:* Gibson <gwasuk...@gmail.com>
>> *Sent:* 17 August 2022 16:53
>> *To:* Akash Vellukai <akashvellukai...@gmail.com>
>> *Cc:* user@spark.apache.org <user@spark.apache.org>
>> *Subject:* [EXTERNAL] Re: Spark streaming - Data Ingestion
>>
>> *Caution! This email originated outside of FedEx. Please do not open
>> attachments or click links from an unknown or suspicious origin*.
>> If you have space for a message log like, then you should try:
>>
>> MySQL -> Kafka (via CDC) -> Spark (Structured Streaming) -> HDFS/S3/ADLS
>> -> Hive
>>
>> On Wed, Aug 17, 2022 at 5:40 PM Akash Vellukai <
>> akashvellukai...@gmail.com> wrote:
>>
>> Dear sir
>>
>> I have tried a lot on this could you help me with this?
>>
>> Data ingestion from MySql to Hive with spark- streaming?
>>
>> Could you give me an overview.
>>
>>
>> Thanks and regards
>> Akash P
>>
>>

Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

Reply via email to