I think the stack trace is quite informative. Assuming line 10 of CsvDataSource is "val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> args(1),"header"->"true"))", then the "args(1)" call is throwing an ArrayIndexOutOfBoundsException. The reason for this is because you aren't passing any command line arguments to your application. When using spark-submit, you should put all of your app command line arguments at then end, after the jar. In your example, I think you'd want:
spark-submit --master yarn --class org.spark.apache.CsvDataSource --files hdfs:///people_csv /home/cloudera/Desktop/TestMain.jar hdfs:///people_csv Also, I don't think it is necessary for you to have "--files hdfs:///people_csv". The documentation for this option says "Comma-separated list of files to be placed in the working directory of each executor." Since you are going to read the "people_csv" file from hdfs, rather than the local file system, it seems unnecessary. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-SQL-Error-tp25050p25064.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org