I have an excel file which unfortunately cannot be converted to CSV format and I am trying to load it using pyspark shell.
I tried invoking the below pyspark session with the jars provided. pyspark --jars /home/siddhesh/Downloads/spark-excel_2.12-0.14.0.jar,/home/siddhesh/Downloads/xmlbeans-5.0.3.jar,/home/siddhesh/Downloads/commons-collections4-4.4.jar,/home/siddhesh/Downloads/poi-5.2.0.jar,/home/siddhesh/Downloads/poi-ooxml-5.2.0.jar,/home/siddhesh/Downloads/poi-ooxml-schemas-4.1.2.jar,/home/siddhesh/Downloads/slf4j-log4j12-1.7.28.jar,/home/siddhesh/Downloads/log4j-1.2-api-2.17.1.jar and below is the code to read the excel file: df = spark.read.format("excel") \ .option("dataAddress", "'Sheet1'!") \ .option("header", "true") \ .option("inferSchema", "true") \ .load("/home/.../Documents/test_excel.xlsx") It is giving me the below error message: java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager I tried several Jars for this error but no luck. Also, what would be the efficient way to load it? Thanks, Sid