I have trying to create table in hive from spark itself,
And using local mode it will work what I am trying here is from spark
standalone I want to create the manage table in hive (another spark
cluster basically CDH) using jdbc mode.
When I try that below are the error I am facing.
On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh,
<mich.talebza...@gmail.com <mailto:mich.talebza...@gmail.com>> wrote:
Have you created that table in Hive or are you trying to create it
from Spark itself.
You Hive is local. In this case you don't need a JDBC connection.
Have you tried:
df2.write.mode("overwrite").saveAsTable(mydb.mytable)
HTH
**view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk.Any and all responsibility
for any loss, damage or destruction of data or any other property
which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for
any monetary damages arising from such loss, damage or destruction.
On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla
<pbadrinath1...@gmail.com <mailto:pbadrinath1...@gmail.com>> wrote:
Hi,
Trying to write data in spark to the hive as JDBC mode below
is the sample code:
spark standalone 2.4.7 version
21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java
classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For
SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
<http://localhost:4040>
Spark context available as 'sc' (master =
spark://localhost:7077, app id = app-20210715080414-0817).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java
1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
val df = Seq(
("John", "Smith", "London"),
("David", "Jones", "India"),
("Michael", "Johnson", "Indonesia"),
("Chris", "Lee", "Brazil"),
("Mike", "Brown", "Russia")
).toDF("first_name", "last_name", "country")
df.write
.format("jdbc")
.option("url",
"jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
.option("dbtable", "test.test")
.option("user", "admin")
.option("password", "admin")
.option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
.mode("overwrite")
.save
// Exiting paste mode, now interpreting.
java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051)
ERROR processing query/statement. Error Code: 40000, SQL
state: TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:39
cannot recognize input near '"first_name"' 'TEXT' ',' in
column name or primary key or foreign key:28:27,
org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
sun.reflect.GeneratedMethodAccessor13:invoke::-1,
sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
java.lang.reflect.Method:invoke:Method.java:498,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
java.security.AccessController:doPrivileged:AccessController.java:-2,
javax.security.auth.Subject:doAs:Subject.java:422,
org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
com.sun.proxy.$Proxy35:executeStatementAsync::-1,
org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
java.lang.Thread:run:Thread.java:748,
*org.apache.hadoop.hive.ql.parse.ParseException:line 1:39
cannot recognize input near '"first_name"' 'TEXT' ',' in
column name or primary key or foreign key:33:6,
org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
sqlState:42000, errorCode:40000, errorMessage:Error while
compiling statement: FAILED: ParseException line 1:39 cannot
recognize input near '"first_name"' 'TEXT' ',' in column name
or primary key or foreign key), Query: CREATE TABLE
test.test("first_name" TEXT , "last_name" TEXT , "country" TEXT ).
at
com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatementInternal(Unknown
Source)
at
com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatement(Unknown
Source)
at
com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown
Source)
at
com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown
Source)
at
com.cloudera.hiveserver2.jdbc.common.SStatement.executeNoParams(Unknown
Source)
at
com.cloudera.hiveserver2.jdbc.common.SStatement.executeAnyUpdate(Unknown
Source)
at
com.cloudera.hiveserver2.jdbc.common.SStatement.executeUpdate(Unknown
Source)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:863)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81)
at
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
... 48 elided
Caused by:
com.cloudera.hiveserver2.support.exceptions.GeneralException:
[Cloudera][HiveJDBCDriver](500051) ERROR processing
query/statement. Error Code: 40000, SQL state:
TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:39
cannot recognize input near '"first_name"' 'TEXT' ',' in
column name or primary key or foreign key:28:27,
org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
sun.reflect.GeneratedMethodAccessor13:invoke::-1,
sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
java.lang.reflect.Method:invoke:Method.java:498,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
java.security.AccessController:doPrivileged:AccessController.java:-2,
javax.security.auth.Subject:doAs:Subject.java:422,
org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
com.sun.proxy.$Proxy35:executeStatementAsync::-1,
org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
java.lang.Thread:run:Thread.java:748,
*org.apache.hadoop.hive.ql.parse.ParseException:line 1:39
cannot recognize input near '"first_name"' 'TEXT' ',' in
column name or primary key or foreign key:33:6,
org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
sqlState:42000, errorCode:40000, errorMessage:Error while
compiling statement: FAILED: ParseException line 1:39 cannot
recognize input near '"first_name"' 'TEXT' ',' in column name
or primary key or foreign key), Query: CREATE TABLE
profile_test.person_test ("first_name" TEXT , "last_name" TEXT
, "country" TEXT ).
... 77 more
Found similar issue ion Jira :
https://issues.apache.org/jira/browse/SPARK-31614
<https://issues.apache.org/jira/browse/SPARK-31614>
There no comments in that and the resolution is Incomplete, is
there any way we can do in spark to write data into the hive
as JDBC mode.
Thanks for any help.
Thanks,
Badrinath.