GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline
runs?
In certain circumstances it may not be desirable to go to the complication of
installing Debezium, Kafka and proper CDC. It may be sufficient to do
pseudo-CDC, i.e. polled pipeline runs, e.g. every minute.
Consider this:
- Many tables with a common `modified_at` datetime field (assuming this has
sufficient resolution to not overlap; could also be an integer, unique primary
key, etc as long as it is comparable)
- A Hop configuration parameter `synced_to` datetime field
- A fetch size
- A poll interval
Then repeated:
SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC
LIMIT ${fetch_size}
After each run the `synced_to` parameter is updated with the last `modified_at`
result retrieved.
If len(result) == `fetch_size`, the pipeline is repeated immediately.
Else the pipeline is scheduled after `poll interval`.
Can Hop do that, maybe with a workflow?
GitHub link: https://github.com/apache/hop/discussions/5134
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]