The POJOs that Flink supports follow the Java Bean style, so they are
mutable.

I agree that direct support for immutable types would be desirable, but in
this case, we need to differentiate a bit more.
Any mutable object can be effective immutable, if the state is not changed
after a certain point. These objects can safely be used as keys in maps.

In our case, you can also use mutable objects in Flink for grouping
operations etc. In fact, Flink uses defensive copies in some places to
actually turn the returned object "immutable".
Also see Environment#enableObjectReuse() / disableObjectReuse()
> By default, objects are not reused in Flink. Enabling the object reuse
mode will instruct the runtime to reuse user objects for better
performance. Keep in mind that this can lead to bugs when the user-code
function of an operation is not aware of this behavior.

Equals/Hashcode should be implemented correctly, ideally generated by your
IDE.

Best,

Arvid

On Mon, Oct 7, 2019 at 4:55 PM Jan Lukavský <je...@seznam.cz> wrote:

> Having said that - the same logic applies to using POJO as keys in
> grouping operations, which heavily rely on hashCode() and equals(). That
> might suggest, that using mutable objects is not the best option there
> either. But that might  be very much subjective claim.
>
> Jan
> On 10/7/19 3:13 PM, Jan Lukavský wrote:
>
> Exactly. And that's why it is good for mutable data, because they are not
> suited for keys either.
>
> Jan
> On 10/7/19 2:58 PM, Chesnay Schepler wrote:
>
> The default hashCode implementation is effectively random and not suited
> for keys as they may not be routed to the same instance.
>
> On 07/10/2019 14:54, Jan Lukavský wrote:
>
> Hi Stephen,
>
> I found a very nice article [1], which might help you solve the issues you
> are concerned about. The elegant solution to this problem might be
> summarized as "do not implement equals() and hashCode() for POJO types, use
> Object's default implementation". I'm not 100% sure that this will not have
> any negative impacts on some other Flink components, but I _suppose_ it
> should not (someone might correct me if I'm wrong).
>
> Jan
>
> [1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/
> On 10/7/19 1:37 PM, Chesnay Schepler wrote:
>
> This question should only be relevant for cases where POJOs are used as
> keys, in which case they *must not* return a class-constant nor
> effectively-random value, as this would break the hash partitioning.
>
> This is somewhat alluded to in the keyBy() documentation
> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations>,
> but could be clarified.
>
> It is in any case heavily discouraged to modify objects after they have
> been emitted from a function; the mutability of POJOs is hence usually not
> a problem.
> On 02/10/2019 14:17, Stephen Connolly wrote:
>
> I notice
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#rules-for-pojo-types
> says that all non-transient fields need a setter.
>
> That means that the fields cannot be final.
>
> That means that the hashCode() should probably just return a constant
> value (otherwise an object could be mutated and then lost from a hash-based
> collection.
>
> Is it really the case that we have to either register a serializer or
> abandon immutability and consequently force hashCode to be a constant value?
>
> What are the recommended implementation patterns for the POJOs used in a
> topology
>
> Thanks
>
> -Stephen
>
>
>
>

Reply via email to