Thank you very much for your previous help. So far I have tried two ways to 
trigger Spark SQL: one is to use R with sparklyr library and SparkR library; 
the other way is to use SparkR shell from Spark. I am not connecting a remote 
spark cluster, but a local one. Both failed with or without hive-site.xml. I 
suspect the content of hive-site.xml I found online was not appropriate for 
this case, as the spark session can not be initialized after adding this 
hive-site.xml. My questions are:

1. Is there any example for the content of hive-site.xml for this case?

2. I used sql() function to call the Spark SQL, is this the right way to do it?

##Here is the content in the hive-site.xml:##

<description>JDBC connect string for a JDBC metastore</description>
<description>Driver class name for a JDBC metastore</description>
<description>username to use against metastore database</description>
<description>password to use against metastore database</description>

##Here is the situation happened in R:##

> library(sparklyr) # load sparklyr package
> sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7")
>  # connect sparklyr with spark
> sql('create database learnsql')
Error in sql("create database learnsql") : could not find function "sql"
> library(SparkR)

Attaching package: ‘SparkR’

The following object is masked from ‘package:sparklyr’:


The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:, colnames, colnames<-, drop, endsWith, intersect, rank, rbind,
    sample, startsWith, subset, summary, transform, union

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') 
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
Spark not found in SPARK_HOME: 
Spark package found in SPARK_HOME: 
Launching java with spark-submit command 
19/06/08 11:14:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
Error in handleErrors(returnStatus, conn) : 

hundreds of lines of information and mistakes here

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized

##Here is what happened in SparkR shell:##

Error in handleErrors(returnStatus, conn) : 
  java.lang.IllegalArgumentException: Error while instantiating 
        at scala.Option.getOrElse(Option.scala:121)
        at scala.collection.Iterator$class.foreach(
> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized

在 2019年6月8日,上午1:44,Rishikesh Gawade wrote:
> Hi.
> 1. Yes you can connect to spark via R. If you are connecting to a remote 
> spark cluster then you'll need EITHER a spark binary along with hive-site.xml 
> in its config direcctory on the machine running R OR livy server installed on 
> the cluster. You can then go on to use SparklyR, which, although has almost 
> the same functions as of SparkR, is recommended over the latter.
> For the first method mentioned above, use
> sc <- sparklyr::spark_connect(master = "yarn-client", spark_home = 
> Sys.getenv("SPARK_HOME"), conf = spark_config())
> For the second method, use
> sc <- sparklyr::spark_connect( master = "livyserverIP:port", method = "livy", 
> conf = livy_config(conf = spark_config(), username = "foo", password = "bar"))
> 2. The reason that you're not getting the desired result could be that 
> hive-site.xml is missing.To be able to connect to Hive from 
> Spark-shell/Spark-submit/SparkR/SparklyR and perform sql operations, you need 
> to have hive-site.xml in the $SPARK_HOME/conf directory. This is 
> hive-site.xml should contain one and only one configuration which would be 
> 'hive.metastore.uris'. 
> 3. In case of spark-sql shell, it should work after putting the 
> aforementioned hive-site.xml in the config directory of Spark. If it doesn't 
> work, then please check the syntax.
On Thu, Jun 6, 2019, 12:18 PM ya wrote: 
> wrote:
> Dear list,
> I am trying to use sparksql within my R, I am having the following questions, 
> could you give me some advice please? Thank you very much.
> 1. I connect my R and spark using the library sparkR, probably some of the 
> members here also are R users? Do I understand correctly that SparkSQL can be 
> connected and triggered via SparkR and used in R (not in sparkR shell of 
> spark)?
> 2. I ran sparkR library in R, trying to create a new sql database and a 
> table, I could not get the database and the table I want. The code looks like 
> below:
> library(SparkR)
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7') 
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
> sql("create database learnsql; use learnsql")
> sql("
> create table employee_tbl
> (emp_id varchar(10) not null,
> emp_name char(10) not null,
> emp_st_addr char(10) not null,
> emp_city char(10) not null,
> emp_st char(10) not null,
> emp_zip integer(5) not null,
> emp_phone integer(10) null,
> emp_pager integer(10) null);
> insert into employee_tbl values ('0001','john','yanlanjie 
> 1','gz','jiaoqiaojun','510006','1353');
> select*from employee_tbl;
> “)
> I ran the following code in spark-sql shell, I get the database learnsql, 
> however, I still can’t get the table. 
> spark-sql> create database learnsql;show databases;
> 19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: 
> Database(name:learnsql, description:, 
> locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
> 19/06/06 14:42:36 INFO audit: ugi=ya    ip=unknown-ip-addr      
> cmd=create_database: Database(name:learnsql, description:, 
> locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})       
> Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: 
> Database learnsql already exists;
> spark-sql> create table employee_tbl
>          > (emp_id varchar(10) not null,
>          > emp_name char(10) not null,
>          > emp_st_addr char(10) not null,
>          > emp_city char(10) not null,
>          > emp_st char(10) not null,
>          > emp_zip integer(5) not null,
>          > emp_phone integer(10) null,
>          > emp_pager integer(10) null);
> Error in query: 
> no viable alternative at input 'create table employee_tbl\n(emp_id 
> varchar(10) not'(line 2, pos 20)
> == SQL ==
> create table employee_tbl
> (emp_id varchar(10) not null,
> --------------------^^^
> emp_name char(10) not null,
> emp_st_addr char(10) not null,
> emp_city char(10) not null,
> emp_st char(10) not null,
> emp_zip integer(5) not null,
> emp_phone integer(10) null,
> emp_pager integer(10) null)
> spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 
> 1','gz','jiaoqiaojun','510006','1353');
> 19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default 
> tbl=employee_tbl
> 19/06/06 14:43:43 INFO audit: ugi=ya    ip=unknown-ip-addr      cmd=get_table 
> : db=default tbl=employee_tbl     
> Error in query: Table or view not found: employee_tbl; line 1 pos 0
> Does sparkSQL has different coding grammar? What did I miss?
> Thank you very much.
