I have an excel file which unfortunately cannot be converted to CSV format
and I am trying to load it using pyspark shell.

I tried invoking the below pyspark session with the jars provided.

pyspark --jars
/home/siddhesh/Downloads/spark-excel_2.12-0.14.0.jar,/home/siddhesh/Downloads/xmlbeans-5.0.3.jar,/home/siddhesh/Downloads/commons-collections4-4.4.jar,/home/siddhesh/Downloads/poi-5.2.0.jar,/home/siddhesh/Downloads/poi-ooxml-5.2.0.jar,/home/siddhesh/Downloads/poi-ooxml-schemas-4.1.2.jar,/home/siddhesh/Downloads/slf4j-log4j12-1.7.28.jar,/home/siddhesh/Downloads/log4j-1.2-api-2.17.1.jar

and below is the code to read the excel file:

df = spark.read.format("excel") \
     .option("dataAddress", "'Sheet1'!") \
     .option("header", "true") \
     .option("inferSchema", "true") \
.load("/home/.../Documents/test_excel.xlsx")

It is giving me the below error message:

 java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager

I tried several Jars for this error but no luck. Also, what would be the
efficient way to load it?

Thanks,
Sid

Reply via email to