Re: Compute the Hash of each row in new column

2020-03-02 Thread Chetan Khatri
Thanks Enrico. I meant one hash of each single row in extra column something like this.. val newDs = typedRows.withColumn("hash", hash( typedRows.columns.map(col): _*)) On Mon, Mar 2, 2020 at 3:51 PM Enrico Minack wrote: > Well, then apply md5 on all columns: > > ds.select(ds.columns.map(col)

Re: Compute the Hash of each row in new column

2020-03-02 Thread Enrico Minack
Well, then apply md5 on all columns: ds.select(ds.columns.map(col) ++ ds.columns.map(column => md5(col(column)).as(s"$column hash")): _*).show(false) Enrico Am 02.03.20 um 11:10 schrieb Chetan Khatri: Thanks Enrico I want to compute hash of all the columns value in the row. On Fri, Feb 28,

Re: Compute the Hash of each row in new column

2020-03-02 Thread Chetan Khatri
Thanks Enrico I want to compute hash of all the columns value in the row. On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack wrote: > This computes the md5 hash of a given column id of Dataset ds: > > ds.withColumn("id hash", md5($"id")).show(false) > > Test with this Dataset ds: > > import

Re: Compute the Hash of each row in new column

2020-02-28 Thread Enrico Minack
This computes the md5 hash of a given column id of Dataset ds: ds.withColumn("id hash", md5($"id")).show(false) Test with this Dataset ds: import org.apache.spark.sql.types._ val ds = spark.range(10).select($"id".cast(StringType)) Available are md5, sha, sha1, sha2 and hash:

Re: Compute the Hash of each row in new column

2020-02-28 Thread Riccardo Ferrari
Hi Chetan, Would the sql function `hash` do the trick for your use-case ? Best, On Fri, Feb 28, 2020 at 1:56 PM Chetan Khatri wrote: > Hi Spark Users, > How can I compute Hash of each row and store in new column at Dataframe, > could someone help me. > > Thanks >

Compute the Hash of each row in new column

2020-02-28 Thread Chetan Khatri
Hi Spark Users, How can I compute Hash of each row and store in new column at Dataframe, could someone help me. Thanks