Hi Ruijing, Spark uses SerializationDebugger ( https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html) as default debugger to detect the serialization issues. You can take more detailed serialization exception information by setting the following while creating a cluster: spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true spark.executor.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
Maxim Gekk Software Engineer Databricks, Inc. On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li <liruijin...@gmail.com> wrote: > Hi all, > > When working with spark jobs, I sometimes have to tackle with > serialization issues, and I have a difficult time trying to fix those. A > lot of times, the serialization issues happen only in cluster mode across > the network in a mesos container, so I can’t debug locally. And the > exception thrown by spark is not very helpful to find the cause. > > I’d love to hear some tips on how to debug in the right places. Also, I’d > be interested to know if in future releases it would be possible to point > out which class or function is causing the serialization issue (right now I > find its either Java generic classes or the class Spark is running itself). > Thanks! > -- > Cheers, > Ruijing Li >