i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1 and copied over the configs.
to give it a quick test i started spark-shell and created a dataset. i get this: 16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect. Spark context Web UI available at http://***:4040 Spark context available as 'sc' (master = yarn, app id = application_1471212701720_1580). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75) Type in expressions to have them evaluated. Type :help for more information. scala> import spark.implicits._ import spark.implicits._ scala> val x = List(1,2,3).toDS org.apache.spark.SparkException: Unable to create database default as failed to create its directory hdfs://dev/home/koert/spark-warehouse at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:114) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:108) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59) at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:423) at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380) at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:171) ... 50 elided this did not happen in spark 2.0.0 the location it is trying to access makes little sense, since it is going to hdfs but then it is looking for my local home directory (/home/koert exists locally but not on hdfs). i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for that WAREHOUSE_PATH got changed: val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir") .doc("The default location for managed databases and tables.") .doc("The default location for managed databases and tables.") .stringConf - .createWithDefault("file:${system:user.dir}/spark-warehouse") + .createWithDefault("${system:user.dir}/spark-warehouse") notice how the file: got removed from the url, causing spark to look on hdfs now since it is my default filesystem on the cluster. but system:user.dir is still a local home directory. when combining the two you get something that doesn't exist.