Hello, Here is something I am unable to explain and goes against Kryo's documentation, numerous suggestions on the web and on this list as well as pure intuition.
Our Spark application runs in a single JVM (perhaps this is relevant, hence mentioning it). We have been using Kryo serialization with Spark (setting the spark.serializer property to org.apache.spark.serializer.KryoSerializer) without explicitly registering classes and everything seems to work well enough. Recently, I have been looking into making some performance improvements and decided to register classes. I turned on the "spark.kryo.registrationRequired" property and started to register all classes as they were reported by the resulting Exceptions. Eventually I managed to register them all. BTW, there is a fairly large number of internal Spark and Scala classes that also I had to register but that's besides the point here. I was hoping to gain some performance improvement as per the suggestions of registering classes. However, what I saw was the exact opposite and surprising. Performance (throughput) actually deteriorated by at least a factor of 50%. I turned off the registrationRequired property but kept the explicit registrations in place with the same result. Then I reduced the number of registrations and performance started to get better again. Eventually I got rid of all the explicit registrations (back to where I started basically) and performance improved back to where it was. I am unable to explain why I am observing this behavior as this is counter-intuitive. Explicit registration is supposed to write smaller amount of data (class names as Strings vs just class Ids as integers) and hence help improve performance. Is the fact that Spark is running in local mode (single JVM) a factor here? Any insights will be appreciated. Thanks NB