Hi Oscar,

I assume the "dependency" in your description refers to the custom fields
in the ProcessFunction's implementation. You are right that as the
ProcessFunction inherits `Serializable` interface so we should make all
fields either serializable or transient.
As for performance, I have no data but theoretically, there be no much
difference in most cases(in fact, maybe you are wondering the default
serialization performance of JDK). For a long running streaming job, the
constructor or open() method are usually not in the key path of performance.
For best practice or to clean codes, in flink's abstraction, open() method
is designed for one time setup work. So it is usually better to mark these
fields as transient and initialize these fields in open() methods
(especially when we need to do some extra work like creating db connection).

Hope it helps!
Best,
Biao Geng

Oscar Perez via user <user@flink.apache.org> 于2024年4月4日周四 17:14写道:

> Hi,
>
> We would like to adhere to clean code and expose all dependencies in the
> constructor of the process functions
>
> In flink, however, all dependencies passed to process functions must be
> serializable. Another workaround is to instantiate these dependencies in
> the open method of the process function and declare this dependency
> transient
>
> I wonder how, performance wise, would impact the performance of the job if
> we declare all dependencies in the constructor and make them serializable.
> Is this a wrong pattern to do? Has anybody run any experiment on
> performance degradation of dependency exposed in the constructor vs
> declaring it in the open method?
>
> Thanks!
> Oscar
>

Reply via email to