Uwe, Is there anything in the V$ARCHIVED_LOG table [1] in your source database? If so you may be able to get some of that information from there. Also is LogMiner [2] enabled on the database? That's another way to be able to query the logs to get things like deletes.
In general, there has to be something in the system that knows when something is deleted. That can be done in a few ways, with varying amounts of success: 1) Cross-join the source table with the target table. This won't pick up records that have been inserted, updated, then deleted since the last time the join was performed. Plus the join is very costly for large tables. 2) Change the schema of the tables to add a soft delete flag (as mentioned in an earlier reply). This often cannot be done because the main app needs a particular schema, or because the CDC user does not have permission to do such things to the source DB 3) Intercept the calls that will change the DB. This is fragile because you may not know if the call succeeds at the DB. Plus you may not be able to change the architecture to put something in between the users and the DB. 4) Interrogate the logs. This is IMO the only real way to do CDC, and is how most CDC solutions operate. The CaptureChangeMySQL processor reads the MySQL binary logs, and if/when the CaptureChangeOracle processor is implemented, it will likely require LogMiner or something to be enabled at the source to make the information available. This pattern can apply to all DBs that support such a thing, as long as they can be interrogated via JDBC (SQL queries) or some client tool (which hopefully for us has a Java port!) Regards, Matt [1] https://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_1016.htm#REFRN30011 [2] https://docs.oracle.com/cd/E11882_01/server.112/e22490/logminer.htm#SUTIL1559 On Mon, Sep 18, 2017 at 1:35 PM, Uwe Geercken <uwe.geerc...@web.de> wrote: > Andrew, > > yes. we are doing the same for the oracle db which is quite old and does not > provide this information. > > Anyway. Was just curious if somebody has a smarter solution. The blogs of > Nifi and Kafka have really good samples of extracting data but none of them > touches the topic of deletes. > > Rgds, > > Uwe > > > > Gesendet: Samstag, 16. September 2017 um 13:52 Uhr > Von: "Andrew Grande" <apere...@gmail.com> > An: users@nifi.apache.org > Betreff: Re: Re: QueryDatabaseTable - Deleted Records > > As an interesting architectural approach we took eons ago, before NiFi, was > to take daily snapshots of a full table. Every row would then be > hashed/digested or in any other way uniquely identified and 2 datasets would > be crossed and compared to find inserts/deletes/updates. It was involved, > but worked. > > Andrew > > > On Sat, Sep 16, 2017, 2:38 AM Uwe Geercken <uwe.geerc...@web.de> wrote: >> >> Bryan, >> >> yes, the change log would be possible. In my use case I have Oracle 11 as >> the source - and I can not change the source easily (takes long - is >> expensive). >> >> I was expecting this answer but wanted to make sure that I have not missed >> anything. I will try to build my use case around something else then. >> >> Thanks for your response(s). >> >> Rgds, >> >> Uwe >> >> Gesendet: Freitag, 15. September 2017 um 16:15 Uhr >> Von: "Bryan Bende" <bbe...@gmail.com> >> An: users@nifi.apache.org >> Betreff: Re: QueryDatabaseTable - Deleted Records >> Uwe, >> >> Typically you need to process the change log of the database in this >> case, which unfortunately usually becomes database specific. >> >> I believe we have a processor CaptureChangeMySQL that can process the >> MySQL change log. >> >> -Bryan >> >> >> On Tue, Sep 12, 2017 at 1:39 PM, Uwe Geercken <uwe.geerc...@web.de> wrote: >> > Hello, >> > >> > apparently the QueryDatabaseTable processor catches changes made to the >> > data >> > of the source database - updates and inserts. >> > >> > Has anybody a good idea or strategy how to handle deletes in the source >> > database? Of course one could flag a record as deleted instead of >> > phisically >> > deleting it. But this means changing the source system in many cases and >> > that is sometimes not possible. And yes, if you process the change log >> > (if >> > available) of the source system that is also a good option. >> > >> > Would be greatful for any tips or a best practive of how you do it. >> > >> > Rgds, >> > >> > Uwe