Please see these logs. The error is thrown in executor:
23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ExceptionInInitializerError
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: A master URL must be set in
your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:385)
at
org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574)
at
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928)
at TestMain$.<init>(TestMain.scala:12)
at TestMain$.<clinit>(TestMain.scala)
On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen <[email protected]> wrote:
> It's not running on the executor; that's not the issue. See your stack
> trace, where it clearly happens in the driver.
>
> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad <[email protected]>
> wrote:
>
>> Even if I set the master as yarn, it will not have access to rest of the
>> spark confs. It will need spark.yarn.app.id.
>>
>> The main issue is if its working as it is in Spark 2.3 why its not
>> working in Spark 3 i.e why the session is getting created on executor.
>> Another thing we tried is removing the df to rdd conversion just for
>> debug and it works in Spark 3.
>>
>> So, it might be something to do with df to rdd conversion or
>> serialization behavior change from Spark 2.3 to Spark 3.0 if there is any.
>> But couldn't find the root cause.
>>
>> Regards,
>> Shrikant
>>
>> On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen <[email protected]> wrote:
>>
>>> So call .setMaster("yarn"), per the error
>>>
>>> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad <[email protected]>
>>> wrote:
>>>
>>>> We are running it in cluster deploy mode with yarn.
>>>>
>>>> Regards,
>>>> Shrikant
>>>>
>>>> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou <[email protected]>
>>>> wrote:
>>>>
>>>>> Can we see your Spark Configuration parameters ?
>>>>>
>>>>> The mater URL refers to as per java
>>>>> new SparkConf()....setMaster("local[*]")
>>>>> according to where you want to run this
>>>>>
>>>>> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to migrate one spark application from Spark 2.3 to 3.0.1.
>>>>>>
>>>>>> The issue can be reproduced using below sample code:
>>>>>>
>>>>>> object TestMain {
>>>>>>
>>>>>> val session =
>>>>>> SparkSession.builder().appName("test").enableHiveSupport().getOrCreate()
>>>>>>
>>>>>> def main(args: Array[String]): Unit = {
>>>>>>
>>>>>> import session.implicits._
>>>>>> val a = *session.*sparkContext.parallelize(*Array*
>>>>>> (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=>
>>>>>> x(0).toString).collect()
>>>>>> *println*(a.mkString("|"))
>>>>>>
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with
>>>>>> below exception:
>>>>>>
>>>>>> Caused by: org.apache.spark.SparkException: A master URL must be set
>>>>>> in your configuration
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:394)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
>>>>>>
>>>>>> at scala.Option.getOrElse(Option.scala:189)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
>>>>>>
>>>>>> at TestMain$.<init>(TestMain.scala:7)
>>>>>>
>>>>>> at TestMain$.<clinit>(TestMain.scala)
>>>>>>
>>>>>>
>>>>>> From the exception it appears that it tries to create spark session
>>>>>> on executor also in Spark 3 whereas its not created again on executor in
>>>>>> Spark 2.3.
>>>>>>
>>>>>> Can anyone help in identfying why there is this change in behavior?
>>>>>>
>>>>>> Thanks and Regards,
>>>>>>
>>>>>> Shrikant
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shrikant Prasad
>>>>>>
>>>>> --
>>>> Regards,
>>>> Shrikant Prasad
>>>>
>>> --
>> Regards,
>> Shrikant Prasad
>>
> --
Regards,
Shrikant Prasad