spark 2.0.1 upgrade breaks on WAREHOUSE_PATH

Koert Kuipers Wed, 05 Oct 2016 21:19:13 -0700

i just replaced out spark 2.0.0 install on yarn cluster with spark 2.0.1
and copied over the configs.


to give it a quick test i started spark-shell and created a dataset. i get
this:

16/10/05 23:55:13 WARN spark.SparkContext: Use an existing SparkContext,
some configuration may not take effect.
Spark context Web UI available at http://***:4040
Spark context available as 'sc' (master = yarn, app id =
application_1471212701720_1580).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import spark.implicits._
import spark.implicits._

scala> val x = List(1,2,3).toDS
org.apache.spark.SparkException: Unable to create database default as
failed to create its directory hdfs://dev/home/koert/spark-warehouse
  at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:114)
  at
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:108)
  at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
  at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
  at
org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
  at
org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
  at
org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
  at
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
  at
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
  at
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:161)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
  at org.apache.spark.sql.Dataset$.apply(Dataset.scala:59)
  at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:423)
  at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:380)
  at
org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:171)
  ... 50 elided

this did not happen in spark 2.0.0
the location it is trying to access makes little sense, since it is going
to hdfs but then it is looking for my local home directory (/home/koert
exists locally but not on hdfs).

i suspect the issue is SPARK-15899, but i am not sure. in the pullreq for
that WAREHOUSE_PATH got changed:
   val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
   val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
     .doc("The default location for managed databases and tables.")
     .doc("The default location for managed databases and tables.")
     .stringConf
 -    .createWithDefault("file:${system:user.dir}/spark-warehouse")
 +    .createWithDefault("${system:user.dir}/spark-warehouse")

notice how the file: got removed from the url, causing spark to look on
hdfs now since it is my default filesystem on the cluster. but
system:user.dir is still a local home directory. when combining the two you
get something that doesn't exist.

spark 2.0.1 upgrade breaks on WAREHOUSE_PATH

Reply via email to