May I suggest that implementing transaction based ETLs is not the best approach? For the most part (scd's excluded), you may want to design your pipelines in a write once read many way with no updates. The above implies that God data and bad data coexist in the same table and that you have a way of identifying good data. For example, with a load id and a register that matches the Window of the load or the data in the load as required. E.g. day, hour. The caveat is that there is one and only valid load id for each window. This should work both for batching and streaming.
Regards Diego On Sat, 7 Dec 2024, 2:46 am hansva (via GitHub), <[email protected]> wrote: > > GitHub user hansva added a comment to the discussion: Why are > transaction-based workflows sooooo slow? > > I will have to throw the question back at you... > What is making it slow? which transforms are impacted? I fear your table > outputs and database operation will be the issue. > And that is a database problem and not a Hop problem. The only thing we do > is, we don't commit until everything is done. > > GitHub link: > https://github.com/apache/hop/discussions/4678#discussioncomment-11486516 > > ---- > This is an automatically sent email for [email protected]. > To unsubscribe, please send an email to: [email protected] > >
