Re: How can I write data to hive with jdbc

Mich Talebzadeh Fri, 30 Jul 2021 02:52:51 -0700

This is a generic JDBC write to a DB from Pyspark

def writeTableWithJDBC(dataFrame, url, tableName, user, password, driver,
mode):
    try:
        dataFrame. \
            write. \
            format("jdbc"). \
            option("url", url). \
            option("dbtable", tableName). \
            option("user", user). \
            option("password", password). \
            option("driver", driver). \
            mode(mode). \
            save()
    except Exception as e:
        print(f"""{e}, quitting""")
        sys.exit(1)


Example


 url = "jdbc:hive2://" + config['hiveVariables']['hiveHost'] + ':' +
config['hiveVariables']['hivePort'] + '/default'


driver=  "com.cloudera.hive.jdbc41.HS2Driver"



Note the correct driver


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 30 Jul 2021 at 10:22, igyu <i...@21cn.com> wrote:

> val DF = sparkSession.read.format("jdbc")
>   .option("url", 
> "jdbc:hive2://tidb4ser:11000/hivetest;hive.server2.proxy.user=jztwk")
>   .option("dbtable", "(SELECT * FROM tb_user where created=1602864000) as 
> tmp")
>   .option("user", "admin")
>   .option("password", "000000")
>   .option("fetchsize", "2000") //每处理批次的条数
>   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>   .load()
>
> that I can read data from hive
>
> but I want write data to hive
>
> DF.write.format("jdbc")
>   .option("url", 
> "jdbc:hive2://tidb4ser:11000/hivetest;hive.server2.proxy.user=jztwk")
>   .option("dbtable", "hivetest.tb_user1")
>   .option("user", "admin")
>   .option("password", "000000")
>   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>   .partitionBy("created")
>   .mode(SaveMode.Overwrite)
>   .save()
>
> I get error
>
> 21/07/30 17:16:38 INFO Utils: Resolved authority: tidb4ser:11000
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException:
> Error while compiling statement: FAILED: ParseException line 1:53 cannot
> recognize input near 'TEXT' ',' 'password' in column type
> at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:266)
> at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:252)
> at
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:318)
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:259)
> at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:487)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:859)
> at
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81)
> at
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
> at
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
> at
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
> at
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
> at
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
> at
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> at
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
> at
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
> at com.join.hive.writer.HiveWriter.saveTo(HiveWriter.scala:29)
> at com.join.synctool$.main(synctool.scala:42)
> at com.join.synctool.main(synctool.scala)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while
> compiling statement: FAILED: ParseException line 1:53 cannot recognize
> input near 'TEXT' ',' 'password' in column type
> at
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329)
> at
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:207)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
> at org.apache.hive.service.cli.operation.Operation.run(Operation.java:260)
> at
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:505)
> at
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:491)
> at
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
> at
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:507)
> at
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
> at
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:567)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:53 cannot recognize
> input near 'TEXT' ',' 'password' in column type
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:221)
> at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:75)
> at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:68)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:564)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1425)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1398)
> at
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:205)
> ... 15 more
> 21/07/30 17:16:39 INFO SparkContext: Invoking stop() from shutdown hook
> 21/07/30 17:16:39 INFO SparkUI: Stopped Spark web UI at
> http://WIN-20201231YGA:4040
>
> ------------------------------
> igyu
>

Re: How can I write data to hive with jdbc

Reply via email to