Re: Masking username in Spark with regexp_replace and reverse functions

2019-03-17 Thread Mich Talebzadeh
Thanks guys. All the analysis on windowing functions are done using the authentic names. I only randomize names for the reporting purposes. So the figures tend to be correct. I agree with you Jorn that masking one name is not enough and one can identify the row through transaction dates and the a

Re: Masking username in Spark with regexp_replace and reverse functions

2019-03-17 Thread Jörn Franke
For the approach below you have to check for collisions, ie different name lead to same masked value. You could hash it. However in order to avoid that one can just try different hashes you need to include in each name a different random factor. However, the anonymization problem is bigger, be

Re: Masking username in Spark with regexp_replace and reverse functions

2019-03-17 Thread JB Data31
Hi, Why don't add a random regexp in regexp substitution, i.e. https://onlinerandomtools.com/generate-random-data-from-regexp @*JB*Δ Le sam. 16 mars 2019 à 18:39, Mich Talebzadeh a écrit : > Hi, > > I am looking at Description column of a bank stateme

Masking username in Spark with regexp_replace and reverse functions

2019-03-16 Thread Mich Talebzadeh
Hi, I am looking at Description column of a bank statement (CSV download) that has the following format scala> account_table.printSchema root |-- TransactionDate: date (nullable = true) |-- TransactionType: string (nullable = true) |-- Description: string (nullable = true) |-- Value: double (