Hi Mich, Thank you for your input. Does monotonically incremental ensure about race condition and does it duplicates the ids at some points with multi threads, multi instances, ... ?
Even System.currentTimeMillis() still has duplication? Cheers, Kevin. On Mon, Sep 5, 2016 at 12:30 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > You can create a monotonically incrementing ID column on your table > > scala> val ll_18740868 = spark.table("accounts.ll_18740868") > scala> val startval = 1 > scala> val df = ll_18740868.withColumn("id", > *monotonically_increasing_id()+* startval).show (2) > +---------------+---------------+---------+-------------+--- > -------------------+-----------+------------+-------+---+ > |transactiondate|transactiontype| sortcode|accountnumber| > transactiondescription|debitamount|creditamount|balance| id| > +---------------+---------------+---------+-------------+--- > -------------------+-----------+------------+-------+---+ > | 2011-12-30| DEB|'30-64-72| 18740868| WWW.GFT.COM CD > 4628 | 50.0| null| 304.89| 1| > | 2011-12-30| DEB|'30-64-72| 18740868| > TDA.CONFECC.D.FRE...| 19.01| null| 354.89| 2| > +---------------+---------------+---------+-------------+--- > -------------------+-----------+------------+-------+---+ > > > Now you have a new ID column > > HTH > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 4 September 2016 at 12:43, Kevin Tran <kevin...@gmail.com> wrote: > >> Hi everyone, >> Please give me your opinions on what is the best ID Generator for ID >> field in parquet ? >> >> UUID.randomUUID(); >> AtomicReference<Long> currentTime = new AtomicReference<>(System.curre >> ntTimeMillis()); >> AtomicLong counter = new AtomicLong(0); >> .... >> >> Thanks, >> Kevin. >> >> >> ---- >> https://issues.apache.org/jira/browse/SPARK-8406 (Race condition when >> writing Parquet files) >> https://github.com/apache/spark/pull/6864/files >> > >