[Spark 3.0.0] Job fails with NPE - worked in Spark 2.4.4

Neelesh Salian Thu, 23 Jul 2020 16:48:17 -0700

Hi folks,

Been trying to debug this issue:
https://gist.github.com/nssalian/203e20432c2ed237717be28642b1871a


*Context:*
*The application (Pyspark):*
1. Read a Hive table from the Metastore (Running Hive 1.2.2)
2. Print schema of the Dataframe read.
3. Do a show() on the df captured. The above error stack trace is from the
show job.

***********************************
*To Reproduce:*
df = spark_ses.sql('select * from <db>.<table> limit 100000')

print("Printing Schema \n")
df.printSchema()

print("Running Show \n")
df.show(100)

spark_ses.stop()

***********************************

*Build Profile and additional application info:*
1. Spark 3.0.0 binary built using
./dev/make-distribution.sh -Pyarn -Phive -Phive-thriftserver
-Dhadoop.version=2.8.5  -Pspark-ganglia-lgpl -Pscala-2.12 -Dscala-2.12
This build is from the Spark github repo from this commit
<https://github.com/apache/spark/commit/3fdfce3120f307147244e5eaf46d61419a723d50>
2. Hive 1.2.2 as the metastore. The Spark application can connect to the
metastore.
3. I can do a *printSchema() *on the df, and it does print out the schema
correctly. But a *show* or attempts to write to a S3 data store fails with
the above error.

Any advice on how I can go about debugging/ solving this?


-- 
Regards,
Neelesh S. Salian

[Spark 3.0.0] Job fails with NPE - worked in Spark 2.4.4

Reply via email to