OK, so 2 of those things, not even I was aware of.

Hans, Perhaps your reply could be put into the docs ?

Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/


On Tue, Nov 19, 2024 at 6:32 PM hansva (via GitHub) <[email protected]> wrote:

>
> GitHub user hansva added a comment to the discussion: HOP Sizing
>
> Hi @xProga,
>
> I'm afraid the answer to all this is "it depends". That's why we don't
> have these guidelines.
>
> Some of the things that are known:
> - Each action/transform (or transform copy) will create a processor thread
>   - This means the maximum amount of active transforms equals the maximum
> amount of threads the CPU supports (-1 for the main process), when the
> amount of active transforms is higher thread switching will occur
>   - So our recommendation is to keep your pipelines as compact as possible
> (~30 transforms sounds like a sane rule)
> - Each transform has a configurable buffer
>   - Each transform (copy) has an input buffer, compared to other tools we
> do not load all data into memory and move them from one transform to the
> next. We have a buffer system (default 10K rows) and transforms will fill
> those buffers and get pushback signals to stop processing/fetching data
>   - This means the amount of active memory = rows in buffers x (columns x
> data type)
>   - There are a couple of exceptions, eg. `Sort Rows` needs to have all
> data so it will load all rows but it has a configurable buffer to spool of
> data to disk
>
> Hop Web:
> Users can run workflows and pipelines inside Hop Web, it is not a
> client/server application. This does imply that these instances need enough
> resources to run the processes locally.
>
> Hop Server:
> This one is mainly used as a remote extension to local development. It is
> a stateless server so workloads do not survive restarts. It does not have
> scheduling.
>
> We recommend using short-lived containers for actual workload scheduling
> using an orchestration tool of your choice.
>
> Hope this helps.
>
>
> GitHub link:
> https://github.com/apache/hop/discussions/4586#discussioncomment-11303562
>
> ----
> This is an automatically sent email for [email protected].
> To unsubscribe, please send an email to: [email protected]
>
>

Reply via email to