Re: How to generate unique incrementing identifier in a structured streaming dataframe

2021-07-15 Thread Mich Talebzadeh
Yes that is true. UUID only introduces uniqueness to the record. Some NoSql databases requires a primary key where UUID can be used. import java.util.UUID scala> var pk = UUID.randomUUID pk: java.util.UUID = 0d91e11a-f5f6-4b4b-a120-8c46a31dad0bscala> pk = UUID.randomUUID pk: java.util.UUID =

Re: How to generate unique incrementing identifier in a structured streaming dataframe

2021-07-15 Thread Felix Kizhakkel Jose
Thank you so much for the insights. @Mich Talebzadeh Really appreciate your detailed examples. @Jungtaek Lim I see your point. I am thinking of having a mapping table with UUID to incremental ID and leverage range pruning etc on a large dataset. @sebastian I have to check how to do something like

Blog post introducing (Apache) DataFu-Spark

2021-07-15 Thread Eyal Allweil
Hi all, Apache DataFu is an Apache project of general purpose Hadoop utils, and *datafu-spark* is a new module in this project with general utilities and UDFs that can be useful to Spark developers. This is a blog post I wrote introducing some of the APIs in

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-15 Thread Mich Talebzadeh
Have you created that table in Hive or are you trying to create it from Spark itself. You Hive is local. In this case you don't need a JDBC connection. Have you tried: df2.write.mode("overwrite").saveAsTable(mydb.mytable) HTH view my Linkedin profile

Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-15 Thread Badrinath Patchikolla
Hi, Trying to write data in spark to the hive as JDBC mode below is the sample code: spark standalone 2.4.7 version 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to

[Spark SQL] : at org.apache.spark.sql.execution.datasources.orc.OrcColumnVector.getDecimal(OrcColumnVector.java:158)

2021-07-15 Thread Ragini Manjaiah
Hi Team I am trying to read from the HDFS path which is a partition on a sales date and then selecting only one particular column spark job fails which is of type decimal(32,20) (nullable = true). when I exclude this column and select others it

Re: compile spark 3.1.1 error

2021-07-15 Thread jiahong li
currently, no solutions find! Dereck Li Apache Spark Contributor Continuing Learner @Hangzhou,China jason_xu 于2021年5月11日周二 上午8:01写道: > Hi Jiahong, I got the same failure on building spark 3.1.1 with hadoop > 2.8.5. > Any chance you find a solution? > > > > -- > Sent from: