Hello,

Here is something I am unable to explain and goes against Kryo's
documentation, numerous suggestions on the web and on this list as well as
pure intuition.

Our Spark application runs in a single JVM (perhaps this is relevant, hence
mentioning it). We have been using Kryo serialization with Spark (setting
the spark.serializer property to
org.apache.spark.serializer.KryoSerializer) without explicitly registering
classes and everything seems to work well enough. Recently, I have been
looking into making some performance improvements and decided to register
classes.

I turned on the "spark.kryo.registrationRequired" property and started to
register all classes as they were reported by the resulting Exceptions.
Eventually I managed to register them all. BTW, there is a fairly large
number of internal Spark and Scala classes that also I had to register but
that's besides the point here.

I was hoping to gain some performance improvement as per the suggestions of
registering classes. However, what I saw was the exact opposite and
surprising. Performance (throughput) actually deteriorated by at least a
factor of 50%. I turned off the registrationRequired property but kept the
explicit registrations in place with the same result. Then I reduced the
number of registrations and performance started to get better again.
Eventually I got rid of all the explicit registrations (back to where I
started basically) and performance improved back to where it was.

I am unable to explain why I am observing this behavior as this is
counter-intuitive. Explicit registration is supposed to write smaller
amount of data (class names as Strings vs just class Ids as integers) and
hence help improve performance. Is the fact that Spark is running in local
mode (single JVM) a factor here? Any insights will be appreciated.

Thanks
NB

Reply via email to