Delta, for example, manages merge/append/delete and also keeps previous states of the table's data, so you can query what it looked like before. See delta.io
On Thu, Jan 27, 2022, 11:54 AM Sid Kal <flinkbyhe...@gmail.com> wrote: > Hi Sean, > > So you mean if I use those file formats it will do the work of CDC > automatically or I would have to handle it via code ? > > Hi Mich, > > Not sure if I understood you. Let me try to explain my scenario. Suppose > there is a Id "1" which is inserted today, so I transformed and ingested > it. Now suppose if this user id is deleted from the source itself. Then how > can I delete it in my transformed db > ? > > > > On Thu, 27 Jan 2022, 22:44 Sean Owen, <sro...@gmail.com> wrote: > >> This is what storage engines like Delta, Hudi, Iceberg are for. No need >> to manage it manually or use a DBMS. These formats allow deletes, upserts, >> etc of data, using Spark, on cloud storage. >> >> On Thu, Jan 27, 2022 at 10:56 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Where ETL data is stored? >>> >>> >>> >>> *But now the main problem is when the record at the source is deleted, >>> it should be deleted in my final transformed record too.* >>> >>> >>> If your final sync (storage) is data warehouse, it should be soft >>> flagged with op_type (Insert/Update/Delete) and op_time (timestamp). >>> >>> >>> >>> HTH >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Thu, 27 Jan 2022 at 15:48, Sid Kal <flinkbyhe...@gmail.com> wrote: >>> >>>> I am using Spark incremental approach for bringing the latest data >>>> everyday. Everything works fine. >>>> >>>> But now the main problem is when the record at the source is deleted, >>>> it should be deleted in my final transformed record too. >>>> >>>> How do I capture such changes and change my table too ? >>>> >>>> Best regards, >>>> Sid >>>> >>>>