Hi Sounds your configuration files are not well filed. What does :
spark.sql("SHOW DATABASES").show(); outputs ? If you only have default database, such investigation there should help https://stackoverflow.com/questions/47257680/unable-to-get-existing-hive-tables-from-hivecontext-using-spark 2018-04-15 18:14 GMT+02:00 Rishikesh Gawade <rishikeshg1...@gmail.com>: > Hello there. I am a newbie in the world of Spark. I have been working on a > Spark Project using Java. > I have configured Hive and Spark to run on Hadoop. > As of now i have created a Hive (derby) database on Hadoop HDFS at the > given location(warehouse location): */user/hive/warehouse *and database > name as : *spam *(saved as *spam.db* at the aforementioned location). > I have been trying to read tables in this database in spark to create > RDDs/DataFrames. > Could anybody please guide me in how I can achieve this? > I used the following statements in my Java Code: > > SparkSession spark = SparkSession > .builder() > .appName("Java Spark Hive Example").master("yarn") > .config("spark.sql.warehouse.dir","/user/hive/warehouse") > .enableHiveSupport() > .getOrCreate(); > spark.sql("USE spam"); > spark.sql("SELECT * FROM spamdataset").show(); > > After this i built the project using Maven as follows: mvn clean package > -DskipTests and a JAR was generated. > > After this, I tried running the project via spark-submit CLI using : > > spark-submit --class com.adbms.SpamFilter --master yarn > ~/IdeaProjects/mlproject/target/mlproject-1.0-SNAPSHOT.jar > > and got the following error: > > Exception in thread "main" > org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: > Database 'spam' not found; > at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$ > apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists( > SessionCatalog.scala:174) > at org.apache.spark.sql.catalyst.catalog.SessionCatalog. > setCurrentDatabase(SessionCatalog.scala:256) > at org.apache.spark.sql.execution.command.SetDatabaseCommand.run( > databases.scala:59) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult$lzycompute(commands.scala:70) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult(commands.scala:68) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190) > at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253) > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( > SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) > at com.adbms.SpamFilter.main(SpamFilter.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.JavaMainApplication.start( > SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:879) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > I request you to please check this and if anything is wrong then please > suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A > link to a webpage having relevant info would also be appreciated. > Thank you in anticipation. > Regards, > Rishikesh Gawade > >