Dear Felix and Richikesh and list, Thank you very much for your previous help. So far I have tried two ways to trigger Spark SQL: one is to use R with sparklyr library and SparkR library; the other way is to use SparkR shell from Spark. I am not connecting a remote spark cluster, but a local one. Both failed with or without hive-site.xml. I suspect the content of hive-site.xml I found online was not appropriate for this case, as the spark session can not be initialized after adding this hive-site.xml. My questions are:
1. Is there any example for the content of hive-site.xml for this case? 2. I used sql() function to call the Spark SQL, is this the right way to do it? ################################### ##Here is the content in the hive-site.xml:## ################################### <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.76.100:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> <description>password to use against metastore database</description> </property> </configuration> ################################ ##Here is the situation happened in R:## ################################ > library(sparklyr) # load sparklyr package > sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7") > # connect sparklyr with spark > sql('create database learnsql') Error in sql("create database learnsql") : could not find function "sql" > library(SparkR) Attaching package: ‘SparkR’ The following object is masked from ‘package:sparklyr’: collect The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var, window The following objects are masked from ‘package:base’: as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union > sql('create database learnsql') Error in getSparkSession() : SparkSession not initialized > Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') > sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')) Spark not found in SPARK_HOME: Spark package found in SPARK_HOME: /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7 Launching java with spark-submit command /Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7/bin/spark-submit sparkr-shell /var/folders/d8/7j6xswf92c3gmhwy_lrk63pm0000gn/T//Rtmpz22kK9/backend_port103d4cfcfd2c 19/06/08 11:14:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Error in handleErrors(returnStatus, conn) : …... hundreds of lines of information and mistakes here …… > sql('create database learnsql') Error in getSparkSession() : SparkSession not initialized ################################### ##Here is what happened in SparkR shell:## #################################### Error in handleErrors(returnStatus, conn) : java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107) at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145) at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141) at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:80) at org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:79) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.Iterator$class.foreach(Iterator.sca > sql('create database learnsql') Error in getSparkSession() : SparkSession not initialized Thank you very much. YA > 在 2019年6月8日,上午1:44,Rishikesh Gawade <rishikeshg1...@gmail.com> 写道: > > Hi. > 1. Yes you can connect to spark via R. If you are connecting to a remote > spark cluster then you'll need EITHER a spark binary along with hive-site.xml > in its config direcctory on the machine running R OR livy server installed on > the cluster. You can then go on to use SparklyR, which, although has almost > the same functions as of SparkR, is recommended over the latter. > For the first method mentioned above, use > sc <- sparklyr::spark_connect(master = "yarn-client", spark_home = > Sys.getenv("SPARK_HOME"), conf = spark_config()) > For the second method, use > sc <- sparklyr::spark_connect( master = "livyserverIP:port", method = "livy", > conf = livy_config(conf = spark_config(), username = "foo", password = "bar")) > > 2. The reason that you're not getting the desired result could be that > hive-site.xml is missing.To be able to connect to Hive from > Spark-shell/Spark-submit/SparkR/SparklyR and perform sql operations, you need > to have hive-site.xml in the $SPARK_HOME/conf directory. This is > hive-site.xml should contain one and only one configuration which would be > 'hive.metastore.uris'. > > 3. In case of spark-sql shell, it should work after putting the > aforementioned hive-site.xml in the config directory of Spark. If it doesn't > work, then please check the syntax. > > Regards, > Rishikesh Gawade > > > On Thu, Jun 6, 2019, 12:18 PM ya <xinxi...@126.com <mailto:xinxi...@126.com>> > wrote: > Dear list, > > I am trying to use sparksql within my R, I am having the following questions, > could you give me some advice please? Thank you very much. > > 1. I connect my R and spark using the library sparkR, probably some of the > members here also are R users? Do I understand correctly that SparkSQL can be > connected and triggered via SparkR and used in R (not in sparkR shell of > spark)? > > 2. I ran sparkR library in R, trying to create a new sql database and a > table, I could not get the database and the table I want. The code looks like > below: > > library(SparkR) > Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') > sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')) > sql("create database learnsql; use learnsql") > sql(" > create table employee_tbl > (emp_id varchar(10) not null, > emp_name char(10) not null, > emp_st_addr char(10) not null, > emp_city char(10) not null, > emp_st char(10) not null, > emp_zip integer(5) not null, > emp_phone integer(10) null, > emp_pager integer(10) null); > insert into employee_tbl values ('0001','john','yanlanjie > 1','gz','jiaoqiaojun','510006','1353'); > select*from employee_tbl; > “) > > I ran the following code in spark-sql shell, I get the database learnsql, > however, I still can’t get the table. > > spark-sql> create database learnsql;show databases; > 19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: > Database(name:learnsql, description:, > locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{}) > 19/06/06 14:42:36 INFO audit: ugi=ya ip=unknown-ip-addr > cmd=create_database: Database(name:learnsql, description:, > locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{}) > Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: > Database learnsql already exists; > > spark-sql> create table employee_tbl > > (emp_id varchar(10) not null, > > emp_name char(10) not null, > > emp_st_addr char(10) not null, > > emp_city char(10) not null, > > emp_st char(10) not null, > > emp_zip integer(5) not null, > > emp_phone integer(10) null, > > emp_pager integer(10) null); > Error in query: > no viable alternative at input 'create table employee_tbl\n(emp_id > varchar(10) not'(line 2, pos 20) > > == SQL == > create table employee_tbl > (emp_id varchar(10) not null, > --------------------^^^ > emp_name char(10) not null, > emp_st_addr char(10) not null, > emp_city char(10) not null, > emp_st char(10) not null, > emp_zip integer(5) not null, > emp_phone integer(10) null, > emp_pager integer(10) null) > > spark-sql> insert into employee_tbl values ('0001','john','yanlanjie > 1','gz','jiaoqiaojun','510006','1353'); > 19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default > tbl=employee_tbl > 19/06/06 14:43:43 INFO audit: ugi=ya ip=unknown-ip-addr cmd=get_table > : db=default tbl=employee_tbl > Error in query: Table or view not found: employee_tbl; line 1 pos 0 > > > Does sparkSQL has different coding grammar? What did I miss? > > Thank you very much. > > Best regards, > > YA > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> >