Hi Beam Team, Can somebody help me understand what are the factors behind SDK Harness memory usage? My first guess is that the SDK Harness memory usage depends on:
1. User code (i.e. DoFns) 2. Bundle size Basically, the maximum memory usage an SDK Harness needs is however much memory it takes for the user DoFn to process the largest bundle size. And the bundle size is determined by the Runner. So to limit SDK Harness memory usage, we have to ensure that our Runner selects small bundle sizes. However, looking through some design and the code, it seems like: - sdk_worker.py <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L385> seems to be have multiple active bundle processors at the same time - The Fn API: How to send and receive data <https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.u78ozd9rrlsf> design doc seems to describe multiplexing multiple logical streams over a gRPC connection Does this mean that the SDK Harnesses process multiple bundles at the same time? If so, how are the number of concurrent bundles limited? Or in general, what suggestions do you have to reduce memory usage of SDK Harnesses? Thanks, Arwin -- *Confidentiality Note:* We care about protecting our proprietary information, confidential material, and trade secrets. This message may contain some or all of those things. Cruise will suffer material harm if anyone other than the intended recipient disseminates or takes any action based on this message. If you have received this message (including any attachments) in error, please delete it immediately and notify the sender promptly.