Re: [D] Pseudo-CDC - polled pipeline runs? (hop)

via GitHub Fri, 04 Apr 2025 18:53:26 -0700


GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline 
runs?


In certain circumstances it may not be desirable to go to the complication of 
installing Debezium, Kafka and proper CDC. It may be sufficient to do 
pseudo-CDC, i.e. polled pipeline runs, e.g. every minute.

Consider this:

- Many tables with a common `modified_at` datetime field (assuming this has 
sufficient resolution to not overlap; could also be an integer, unique primary 
key, etc as long as it is comparable)
- A Hop configuration parameter `synced_to` datetime field
- A fetch size
- A poll interval

Then repeated:
SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC 
LIMIT ${fetch_size}

After each run the `synced_to` parameter is updated with the last `modified_at` 
result retrieved.

If len(result) == `fetch_size`, the pipeline is repeated immediately.
Else the pipeline is scheduled after `poll interval`.

Can Hop do that, maybe with a workflow?

GitHub link: https://github.com/apache/hop/discussions/5134

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Pseudo-CDC - polled pipeline runs? (hop)

Reply via email to