GitHub user hansva added a comment to the discussion: HOP Sizing
Hi @xProga, I'm afraid the answer to all this is "it depends". That's why we don't have these guidelines. Some of the things that are known: - Each action/transform (or transform copy) will create a processor thread - This means the maximum amount of active transforms equals the maximum amount of threads the CPU supports (-1 for the main process), when the amount of active transforms is higher thread switching will occur - So our recommendation is to keep your pipelines as compact as possible (~30 transforms sounds like a sane rule) - Each transform has a configurable buffer - Each transform (copy) has an input buffer, compared to other tools we do not load all data into memory and move them from one transform to the next. We have a buffer system (default 10K rows) and transforms will fill those buffers and get pushback signals to stop processing/fetching data - This means the amount of active memory = rows in buffers x (columns x data type) - There are a couple of exceptions, eg. `Sort Rows` needs to have all data so it will load all rows but it has a configurable buffer to spool of data to disk Hop Web: Users can run workflows and pipelines inside Hop Web, it is not a client/server application. This does imply that these instances need enough resources to run the processes locally. Hop Server: This one is mainly used as a remote extension to local development. It is a stateless server so workloads do not survive restarts. It does not have scheduling. We recommend using short-lived containers for actual workload scheduling using an orchestration tool of your choice. Hope this helps. GitHub link: https://github.com/apache/hop/discussions/4586#discussioncomment-11303562 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
