Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

Daniel de Oliveira Mantovani Tue, 20 Jul 2021 09:26:35 -0700

Insert mode is "overwrite", it shouldn't doesn't matter if the table
already exists or not. The JDBC driver should be based on the Cloudera Hive
version, we can't know the CDH version he's using.


On Tue, Jul 20, 2021 at 1:21 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> The driver is fine and latest and  it should work.
>
> I have asked the thread owner to send the DDL of the table and how the
> table is created. In this case JDBC from Spark expects the table to be
> there.
>
> The error below
>
> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
> processing query/statement. Error Code: 40000, SQL state:
> TStatus(statusCode:ERROR_STATUS,
> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
> compiling statement: FAILED: ParseException line 1:39 cannot recognize
> input near '"first_name"' 'TEXT' ',' in column name or primary key or
> foreign key:28:27,
> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329
>
> Sounds like a mismatch between the columns through Spark Dataframe and the
> underlying table.
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 20 Jul 2021 at 17:05, Daniel de Oliveira Mantovani <
> daniel.oliveira.mantov...@gmail.com> wrote:
>
>> Badrinath is trying to write to a Hive in a cluster where he doesn't have
>> permission to submit spark jobs, he doesn't have Hive/Spark metadata
>> access.
>> The only way to communicate with this third-party Hive cluster is through
>> JDBC protocol.
>>
>> [ Cloudera Data Hub - Hive Server] <-> [Spark Standalone]
>>
>> Who's creating this table is "Spark" because he's using "overwrite" in
>> order to test it.
>>
>>  df.write
>>   .format("jdbc")
>>   .option("url",
>> "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
>>   .option("dbtable", "test.test")
>>   .option("user", "admin")
>>   .option("password", "admin")
>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>> *  .mode("overwrite")*
>>   .save
>>
>> This error is weird, looks like the third-party Hive server isn't able to
>> recognize the SQL dialect coming from  [Spark Standalone] server JDBC
>> driver.
>>
>> 1) I would try to execute the create statement manually in this server
>> 2) if works try to run again with "append" option
>>
>> I would open a case with Cloudera and ask which driver you should use.
>>
>> Thanks
>>
>>
>>
>> On Mon, Jul 19, 2021 at 10:33 AM Artemis User <arte...@dtechspace.com>
>> wrote:
>>
>>> As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's
>>> saveAsTable method is the way to go.   JDBC Driver is for a JDBC client (a
>>> Java client for instance) to access the Hive tables in Spark via the Thrift
>>> server interface.
>>>
>>> -- ND
>>>
>>> On 7/19/21 2:42 AM, Badrinath Patchikolla wrote:
>>>
>>> I have trying to create table in hive from spark itself,
>>>
>>> And using local mode it will work what I am trying here is from spark
>>> standalone I want to create the manage table in hive (another spark cluster
>>> basically CDH) using jdbc mode.
>>>
>>> When I try that below are the error I am facing.
>>>
>>> On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Have you created that table in Hive or are you trying to create it from
>>>> Spark itself.
>>>>
>>>> You Hive is local. In this case you don't need a JDBC connection. Have
>>>> you tried:
>>>>
>>>> df2.write.mode("overwrite").saveAsTable(mydb.mytable)
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla <
>>>> pbadrinath1...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Trying to write data in spark to the hive as JDBC mode below  is the
>>>>> sample code:
>>>>>
>>>>> spark standalone 2.4.7 version
>>>>>
>>>>> 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>> where
>>>>> applicable
>>>>> Setting default log level to "WARN".
>>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>>> setLogLevel(newLevel).
>>>>> Spark context Web UI available at http://localhost:4040
>>>>> Spark context available as 'sc' (master = spark://localhost:7077, app
>>>>> id = app-20210715080414-0817).
>>>>> Spark session available as 'spark'.
>>>>> Welcome to
>>>>>       ____              __
>>>>>      / __/__  ___ _____/ /__
>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>    /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>>>>>       /_/
>>>>>
>>>>> Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
>>>>> Type in expressions to have them evaluated.
>>>>> Type :help for more information.
>>>>>
>>>>> scala> :paste
>>>>> // Entering paste mode (ctrl-D to finish)
>>>>>
>>>>> val df = Seq(
>>>>>     ("John", "Smith", "London"),
>>>>>     ("David", "Jones", "India"),
>>>>>     ("Michael", "Johnson", "Indonesia"),
>>>>>     ("Chris", "Lee", "Brazil"),
>>>>>     ("Mike", "Brown", "Russia")
>>>>>   ).toDF("first_name", "last_name", "country")
>>>>>
>>>>>
>>>>>  df.write
>>>>>   .format("jdbc")
>>>>>   .option("url",
>>>>> "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
>>>>>   .option("dbtable", "test.test")
>>>>>   .option("user", "admin")
>>>>>   .option("password", "admin")
>>>>>   .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>>>>>   .mode("overwrite")
>>>>>   .save
>>>>>
>>>>>
>>>>> // Exiting paste mode, now interpreting.
>>>>>
>>>>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR
>>>>> processing query/statement. Error Code: 40000, SQL state:
>>>>> TStatus(statusCode:ERROR_STATUS,
>>>>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>>>>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>> foreign key:28:27,
>>>>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>>>>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>>>>> sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>>>>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>>>>> java.lang.reflect.Method:invoke:Method.java:498,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>>>>> java.security.AccessController:doPrivileged:AccessController.java:-2,
>>>>> javax.security.auth.Subject:doAs:Subject.java:422,
>>>>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>>>>> com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>>>>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>>>>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>>>>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>>>>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>>>>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>>>>> java.lang.Thread:run:Thread.java:748,
>>>>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot recognize
>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>> foreign key:33:6,
>>>>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>>>>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>>>>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>>>>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>>>>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling
>>>>> statement: FAILED: ParseException line 1:39 cannot recognize input near
>>>>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key),
>>>>> Query: CREATE TABLE test.test("first_name" TEXT , "last_name" TEXT ,
>>>>> "country" TEXT ).
>>>>>   at
>>>>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatementInternal(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatement(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeNoParams(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeAnyUpdate(Unknown
>>>>> Source)
>>>>>   at
>>>>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeUpdate(Unknown
>>>>> Source)
>>>>>   at
>>>>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:863)
>>>>>   at
>>>>> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81)
>>>>>   at
>>>>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>>>>>   at
>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>>>>>   at
>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>>>>>   at
>>>>> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>>>>>   at
>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>>>>>   at
>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>>>>>   at
>>>>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
>>>>>   at
>>>>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>>>>>   at
>>>>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>>>>>   at
>>>>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>>>>>   at
>>>>> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
>>>>>   at
>>>>> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
>>>>>   at
>>>>> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
>>>>>   ... 48 elided
>>>>> Caused by:
>>>>> com.cloudera.hiveserver2.support.exceptions.GeneralException:
>>>>> [Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error
>>>>> Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS,
>>>>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while
>>>>> compiling statement: FAILED: ParseException line 1:39 cannot recognize
>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>> foreign key:28:27,
>>>>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>>>>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>>>>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>>>>> sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>>>>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>>>>> java.lang.reflect.Method:invoke:Method.java:498,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>>>>> java.security.AccessController:doPrivileged:AccessController.java:-2,
>>>>> javax.security.auth.Subject:doAs:Subject.java:422,
>>>>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>>>>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>>>>> com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>>>>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>>>>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>>>>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>>>>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>>>>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>>>>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>>>>> java.lang.Thread:run:Thread.java:748,
>>>>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot recognize
>>>>> input near '"first_name"' 'TEXT' ',' in column name or primary key or
>>>>> foreign key:33:6,
>>>>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>>>>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>>>>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>>>>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>>>>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>>>>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>>>>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling
>>>>> statement: FAILED: ParseException line 1:39 cannot recognize input near
>>>>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key),
>>>>> Query: CREATE TABLE profile_test.person_test ("first_name" TEXT ,
>>>>> "last_name" TEXT , "country" TEXT ).
>>>>>   ... 77 more
>>>>>
>>>>>
>>>>> Found similar issue ion Jira :
>>>>>
>>>>> https://issues.apache.org/jira/browse/SPARK-31614
>>>>>
>>>>> There no comments in that and the resolution is Incomplete, is there
>>>>> any way we can do in spark to write data into the hive as JDBC mode.
>>>>>
>>>>> Thanks for any help.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Badrinath.
>>>>>
>>>>
>>>
>>
>> --
>>
>> --
>> Daniel Mantovani
>>
>>

-- 

--
Daniel Mantovani

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

Reply via email to