SDK Harness Memory Usage

Arwin Tio via user Thu, 08 Dec 2022 14:22:39 -0800

Hi Beam Team,

Can somebody help me understand what are the factors behind SDK Harness
memory usage? My first guess is that the SDK Harness memory usage depends
on:

1. User code (i.e. DoFns)
2. Bundle size

Basically, the maximum memory usage an SDK Harness needs is however much
memory it takes for the user DoFn to process the largest bundle size. And
the bundle size is determined by the Runner. So to limit SDK Harness memory
usage, we have to ensure that our Runner selects small bundle sizes.

However, looking through some design and the code, it seems like:

- sdk_worker.py

<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/sdk_worker.py#L385>
seems
to be have multiple active bundle processors at the same time
- The Fn API: How to send and receive data

<https://docs.google.com/document/d/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA/edit#heading=h.u78ozd9rrlsf>
design
doc seems to describe multiplexing multiple logical streams over a gRPC
connection

Does this mean that the SDK Harnesses process multiple bundles at the same
time? If so, how are the number of concurrent bundles limited?

Or in general, what suggestions do you have to reduce memory usage of SDK
Harnesses?

Thanks,

Arwin

*Confidentiality Note:* We care about protecting our proprietary
information, confidential material, and trade secrets. This message may
contain some or all of those things. Cruise will suffer material harm if
anyone other than the intended recipient disseminates or takes any action
based on this message. If you have received this message (including any
attachments) in error, please delete it immediately and notify the sender
promptly.

SDK Harness Memory Usage

Reply via email to