GitHub user casesolved-co-uk edited a discussion: Pseudo-CDC - polled pipeline
runs?
In certain circumstances it may not be desirable to go to the complication of
installing Debezium, Kafka and proper CDC. It may be sufficient (e.g. small
data) to do pseudo-CDC, i.e. polled pipeline runs, e.g. every minute.
Consider this:
- Many tables with a common `modified_at` datetime field (assuming this has
sufficient resolution to not overlap; could also be an integer, unique primary
key, etc as long as it is comparable)
- A Hop configuration parameter `synced_to` datetime field
- A fetch size
- A poll interval
Then repeated:
SELECT * FROM sometable WHERE modified_at>${synced_to} ORDER BY modified_at ASC
LIMIT ${fetch_size}
After each run the `synced_to` parameter is updated with the last `modified_at`
result retrieved.
If len(result) == `fetch_size`, the pipeline is repeated immediately.
Else the pipeline is scheduled after `poll interval`.
Can Hop do that, maybe with a workflow?
GitHub link: https://github.com/apache/hop/discussions/5134
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]