Re: De-identification_in Hive

2016-03-20 Thread Ajay Chander
Thanks for your time Mich! I will try this one out. On Thursday, March 17, 2016, Mich Talebzadeh wrote: > Then probably the easiest option would be in INSERT/SELECT from external > table to target table and make that column NULL > > Check the VAT column here that I

De-identification_in Hive

2016-03-20 Thread Ajay Chander
Hi Everyone, I have a csv.file which has some sensitive data in a particular column in it. Now I have to create a table in hive and load the data into it. But when loading the data I have to make sure that the data is masked. Is there any built in function is used ch supports this or do I have

Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Tustin, Is there anyway I can deidentify it in hive ? On Thursday, March 17, 2016, Marcin Tustin wrote: > This is a classic transform-load problem. You'll want to anonymise it once > before making it available for analysis. > > On Thursday, March 17, 2016, Ajay Chander

Re: De-identification_in Hive

2016-03-19 Thread Mich Talebzadeh
Then probably the easiest option would be in INSERT/SELECT from external table to target table and make that column NULL Check the VAT column here that I made it NULL DROP TABLE IF EXISTS stg_t2; CREATE EXTERNAL TABLE stg_t2 ( INVOICENUMBER string ,PAYMENTDATE string ,NET string ,VAT string

Re: De-identification_in Hive

2016-03-19 Thread Marcin Tustin
This is a classic transform-load problem. You'll want to anonymise it once before making it available for analysis. On Thursday, March 17, 2016, Ajay Chander wrote: > Hi Everyone, > > I have a csv.file which has some sensitive data in a particular column > in it. Now I

Re: De-identification_in Hive

2016-03-19 Thread Mich Talebzadeh
Are you loading your CSV file from an External table into Hive table.? Basically you want to scramble that column before putting into Hive table? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Mich, I am okay with replacing the columns data with some characters like asterisk. Thanks On Thursday, March 17, 2016, Mich Talebzadeh wrote: > Hi Ajay, > > Do you want to be able to unmask it (at any time) or just have it totally > scrambled (for example replace the

Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I have created an external table 'xyz' to load that data into it. One of the columns data 'ssn' needs to be masked. Is there any built in function is give that I could use? On Thursday, March 17, 2016, Mich Talebzadeh

Re: De-identification_in Hive

2016-03-19 Thread Jörn Franke
What are your requirements? Do you need to omit a column? Transform it? Make the anonymized version joinable etc. there is not simply one function. > On 17 Mar 2016, at 14:58, Ajay Chander wrote: > > Hi Everyone, > > I have a csv.file which has some sensitive data in a

Re: De-identification_in Hive

2016-03-19 Thread Ajay Chander
Jorne, I have around hundred big csv files in my local machine. Each file has some number of columns which has sensitive information in it. I don't want to drop the columns manually. Now I have to bring those files into hive external tables, but I want to make sure that the columns which has

Re: De-identification_in Hive

2016-03-18 Thread Mich Talebzadeh
Hi Ajay, Do you want to be able to unmask it (at any time) or just have it totally scrambled (for example replace the column with random characters) in Hive? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: De-identification_in Hive

2016-03-18 Thread Damien Carol
For the record, see this ticket: https://issues.apache.org/jira/browse/HIVE-13125 2016-03-17 17:02 GMT+01:00 Ajay Chander : > Thanks for your time Mich! I will try this one out. > > > On Thursday, March 17, 2016, Mich Talebzadeh > wrote: > >>