Re: [D] HOP Sizing (hop)

via GitHub Tue, 19 Nov 2024 02:51:25 -0800


GitHub user hansva added a comment to the discussion: HOP Sizing


Hi @xProga,

I'm afraid the answer to all this is "it depends". That's why we don't have 
these guidelines.

Some of the things that are known:
- Each action/transform (or transform copy) will create a processor thread
  - This means the maximum amount of active transforms equals the maximum 
amount of threads the CPU supports (-1 for the main process), when the amount 
of active transforms is higher thread switching will occur
  - So our recommendation is to keep your pipelines as compact as possible (~30 
transforms sounds like a sane rule)
- Each transform has a configurable buffer
  - Each transform (copy) has an input buffer, compared to other tools we do 
not load all data into memory and move them from one transform to the next. We 
have a buffer system (default 10K rows) and transforms will fill those buffers 
and get pushback signals to stop processing/fetching data
  - This means the amount of active memory = rows in buffers x (columns x data 
type)
  - There are a couple of exceptions, eg. `Sort Rows` needs to have all data so 
it will load all rows but it has a configurable buffer to spool of data to disk

Hop Web:
Users can run workflows and pipelines inside Hop Web, it is not a client/server 
application. This does imply that these instances need enough resources to run 
the processes locally.

Hop Server:
This one is mainly used as a remote extension to local development. It is a 
stateless server so workloads do not survive restarts. It does not have 
scheduling.

We recommend using short-lived containers for actual workload scheduling using 
an orchestration tool of your choice.

Hope this helps.
 

GitHub link: 
https://github.com/apache/hop/discussions/4586#discussioncomment-11303562

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] HOP Sizing (hop)

Reply via email to