Re: [D] Why are transaction-based workflows sooooo slow? (hop)

[email protected] Fri, 06 Dec 2024 11:56:52 -0800

May I suggest that implementing transaction based ETLs is not the best
approach?
For the most part (scd's excluded), you may want to design your pipelines
in a write once read many way with no updates.
The above implies that God data and bad data coexist in the same table and
that you have a way of identifying good data.
For example, with a load id and a register that matches the Window of the
load or the data in the load as required.  E.g. day,  hour. The caveat is
that there is one and only valid load id for each window.
This should work both for batching and streaming.


Regards

Diego

On Sat, 7 Dec 2024, 2:46 am hansva (via GitHub), <[email protected]> wrote:

>
> GitHub user hansva added a comment to the discussion: Why are
> transaction-based workflows sooooo slow?
>
> I will have to throw the question back at you...
> What is making it slow? which transforms are impacted? I fear your table
> outputs and database operation will be the issue.
> And that is a database problem and not a Hop problem. The only thing we do
> is, we don't commit until everything is done.
>
> GitHub link:
> https://github.com/apache/hop/discussions/4678#discussioncomment-11486516
>
> ----
> This is an automatically sent email for [email protected].
> To unsubscribe, please send an email to: [email protected]
>
>

Re: [D] Why are transaction-based workflows sooooo slow? (hop)

Reply via email to