Re: Best ID Generator for ID field in parquet ?

Mich Talebzadeh Sun, 04 Sep 2016 05:31:00 -0700

You can create a monotonically incrementing ID column on your table

scala> val ll_18740868 = spark.table("accounts.ll_18740868")
scala> val startval = 1
scala> val df = ll_18740868.withColumn("id",
*monotonically_increasing_id()+* startval).show (2)
+---------------+---------------+---------+-------------+----------------------+-----------+------------+-------+---+
|transactiondate|transactiontype|
sortcode|accountnumber|transactiondescription|debitamount|creditamount|balance|
id|
+---------------+---------------+---------+-------------+----------------------+-----------+------------+-------+---+
|     2011-12-30|            DEB|'30-64-72|     18740868|  WWW.GFT.COM CD
4628 |       50.0|        null| 304.89|  1|
|     2011-12-30|            DEB|'30-64-72|     18740868|
TDA.CONFECC.D.FRE...|      19.01|        null| 354.89|  2|
+---------------+---------------+---------+-------------+----------------------+-----------+------------+-------+---+

Now you have a new ID column

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 4 September 2016 at 12:43, Kevin Tran <kevin...@gmail.com> wrote:

> Hi everyone,
> Please give me your opinions on what is the best ID Generator for ID field
> in parquet ?
>
> UUID.randomUUID();
> AtomicReference<Long> currentTime = new AtomicReference<>(System.
> currentTimeMillis());
> AtomicLong counter = new AtomicLong(0);
> ....
>
> Thanks,
> Kevin.
>
>
> ----
> https://issues.apache.org/jira/browse/SPARK-8406 (Race condition when
> writing Parquet files)
> https://github.com/apache/spark/pull/6864/files
>

Re: Best ID Generator for ID field in parquet ?

Reply via email to