I think the stack trace is quite informative.

Assuming line 10 of CsvDataSource is "val df =
sqlContext.load("com.databricks.spark.csv", Map("path" ->
args(1),"header"->"true"))", then the "args(1)" call is throwing an
ArrayIndexOutOfBoundsException. The reason for this is because you aren't
passing any command line arguments to your application. When using
spark-submit, you should put all of your app command line arguments at then
end, after the jar. In your example, I think you'd want:

spark-submit --master yarn --class org.spark.apache.CsvDataSource --files
hdfs:///people_csv /home/cloudera/Desktop/TestMain.jar hdfs:///people_csv

Also, I don't think it is necessary for you to have "--files
hdfs:///people_csv". The documentation for this option says "Comma-separated
list of files to be placed in the working directory of each executor." Since
you are going to read the "people_csv" file from hdfs, rather than the local
file system, it seems unnecessary.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-SQL-Error-tp25050p25064.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to