Hi all, I find an issue in sparkR, maybe it's a bug:
When I read csv file, it's normal to use the following way: ${SPARK_HOME}/bin/spark-submit --packages com.databricks:spark-csv_2.11:1.4.0 example.R But using the following way will give an error: sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.4.0") 16/06/17 09:54:12 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) It is obvious that the sparkR.init() does not load the specified package! ----------------------------------------------------------------------------------------------------------------------------------------- Appendix: The complete code for example.R: if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/home/hadoop/spark-1.6.1-bin-hadoop2.6") } library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sc <- sparkR.init(master = "local[2]", sparkEnvir = list(spark.driver.memory="1g"), sparkPackages="com.databricks:spark-csv_2.11:1.4.0") sqlContext <- sparkRSQL.init(sc) people <- read.df(sqlContext, "file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv", "csv") registerTempTable(people, "people") teenagers <- sql(sqlContext, "SELECT * FROM people") head(teenagers) Joseph