Then probably the easiest option would be in INSERT/SELECT from external
table to target table and make that column NULL
Check the VAT column here that I made it NULL
DROP TABLE IF EXISTS stg_t2;
CREATE EXTERNAL TABLE stg_t2 (
INVOICENUMBER string
,PAYMENTDATE string
,NET string
,VAT string
,TOTAL string
)
COMMENT 'from csv file from excel sheet xxxx'
ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION '/data/stg/table2'
TBLPROPERTIES ("skip.header.line.count"="1")
;
--3)
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (
INVOICENUMBER INT
,PAYMENTDATE timestamp
,NET DECIMAL(20,2)
,VAT DECIMAL(20,2)
,TOTAL DECIMAL(20,2)
)
COMMENT 'from csv file from excel sheet xxxx'
CLUSTERED BY (INVOICENUMBER) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="ZLIB",
"transactional"="true")
;
--4) Put data in target table. do the conversion and ignore empty rows
INSERT INTO TABLE t2
SELECT
INVOICENUMBER
, CAST(UNIX_TIMESTAMP(paymentdate,'DD/MM/YYYY')*1000 as timestamp)
, CAST(REGEXP_REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2))
, NULL
, CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2))
FROM
stg_t2
WHERE
-- INVOICENUMBER > 0 AND
CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0
-- Exclude empty rows
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 17 March 2016 at 15:32, Ajay Chander <[email protected]> wrote:
> Mich, I am okay with replacing the columns data with some characters like
> asterisk. Thanks
>
>
> On Thursday, March 17, 2016, Mich Talebzadeh <[email protected]>
> wrote:
>
>> Hi Ajay,
>>
>> Do you want to be able to unmask it (at any time) or just have it totally
>> scrambled (for example replace the column with random characters) in Hive?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 March 2016 at 15:14, Ajay Chander <[email protected]> wrote:
>>
>>> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I
>>> have created an external table 'xyz' to load that data into it. One of the
>>> columns data 'ssn' needs to be masked. Is there any built in function is
>>> give that I could use?
>>>
>>>
>>> On Thursday, March 17, 2016, Mich Talebzadeh <[email protected]>
>>> wrote:
>>>
>>>> Are you loading your CSV file from an External table into Hive table.?
>>>>
>>>> Basically you want to scramble that column before putting into Hive
>>>> table?
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn *
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 March 2016 at 14:37, Ajay Chander <[email protected]> wrote:
>>>>
>>>>> Tustin, Is there anyway I can deidentify it in hive ?
>>>>>
>>>>>
>>>>> On Thursday, March 17, 2016, Marcin Tustin <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> This is a classic transform-load problem. You'll want to anonymise it
>>>>>> once before making it available for analysis.
>>>>>>
>>>>>> On Thursday, March 17, 2016, Ajay Chander <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> I have a csv.file which has some sensitive data in a particular
>>>>>>> column in it. Now I have to create a table in hive and load the data
>>>>>>> into
>>>>>>> it. But when loading the data I have to make sure that the data is
>>>>>>> masked.
>>>>>>> Is there any built in function is used ch supports this or do I have to
>>>>>>> write UDF ? Any suggestions are appreciated. Thanks
>>>>>>
>>>>>>
>>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>>> <http://www.handy.com/careers>
>>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>>> Handy just raised $50m
>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>>> led
>>>>>> by Fidelity
>>>>>>
>>>>>>
>>>>
>>