dataset aggregators with kryo encoder very slow

Koert Kuipers Thu, 19 Jan 2017 13:18:06 -0800

we just converted a job from RDD to Dataset. the job does a single map-red
phase using aggregators. we are seeing very bad performance for the Dataset
version, about 10x slower.


in the Dataset version we use kryo encoders for some of the aggregators.
based on some basic profiling of spark in local mode i believe the bad
performance is due to the kryo encoders. about 70% of time is spend in kryo
related classes.

since we also use kryo for serialization with the RDD i am surprised how
big the performance difference is.

has anyone seen the same thing? any suggestions for how to improve this?

dataset aggregators with kryo encoder very slow

Reply via email to