Then probably the easiest option would be in INSERT/SELECT from external
table to target table and make that column NULL

Check the VAT column here that I made it NULL

DROP TABLE IF EXISTS stg_t2;
CREATE EXTERNAL TABLE stg_t2 (
 INVOICENUMBER string
,PAYMENTDATE string
,NET string
,VAT string
,TOTAL string
)
COMMENT 'from csv file from excel sheet xxxx'
ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION '/data/stg/table2'
TBLPROPERTIES ("skip.header.line.count"="1")
;
--3)
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (
 INVOICENUMBER          INT
,PAYMENTDATE            timestamp
,NET                    DECIMAL(20,2)
,VAT                    DECIMAL(20,2)
,TOTAL                  DECIMAL(20,2)
)
COMMENT 'from csv file from excel sheet xxxx'
CLUSTERED BY (INVOICENUMBER) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="ZLIB",
"transactional"="true")
;
--4) Put data in target table. do the conversion and ignore empty rows
INSERT INTO TABLE t2
SELECT
          INVOICENUMBER
        , CAST(UNIX_TIMESTAMP(paymentdate,'DD/MM/YYYY')*1000 as timestamp)
        , CAST(REGEXP_REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2))
        , NULL
        , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2))
FROM
stg_t2
WHERE
--        INVOICENUMBER > 0 AND
        CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0
-- Exclude empty rows

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 March 2016 at 15:32, Ajay Chander <hadoopde...@gmail.com> wrote:

> Mich, I am okay with replacing the columns data with some characters like
> asterisk. Thanks
>
>
> On Thursday, March 17, 2016, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi Ajay,
>>
>> Do you want to be able to unmask it (at any time) or just have it totally
>> scrambled (for example replace the column with random characters) in Hive?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 March 2016 at 15:14, Ajay Chander <hadoopde...@gmail.com> wrote:
>>
>>> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I
>>> have created an external table 'xyz' to load that data into it. One of the
>>> columns data 'ssn' needs to be masked. Is there any built in function is
>>> give that I could use?
>>>
>>>
>>> On Thursday, March 17, 2016, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Are you loading your CSV file from an External table into Hive table.?
>>>>
>>>> Basically you want to scramble that column before putting into Hive
>>>> table?
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 March 2016 at 14:37, Ajay Chander <hadoopde...@gmail.com> wrote:
>>>>
>>>>> Tustin, Is there anyway I can deidentify it in hive ?
>>>>>
>>>>>
>>>>> On Thursday, March 17, 2016, Marcin Tustin <mtus...@handybook.com>
>>>>> wrote:
>>>>>
>>>>>> This is a classic transform-load problem. You'll want to anonymise it
>>>>>> once before making it available for analysis.
>>>>>>
>>>>>> On Thursday, March 17, 2016, Ajay Chander <hadoopde...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> I have a csv.file which has some sensitive data in a particular
>>>>>>> column in it.  Now I have to create a table in hive and load the data 
>>>>>>> into
>>>>>>> it. But when loading the data I have to make sure that the data is 
>>>>>>> masked.
>>>>>>> Is there any built in function is used ch supports this or do I have to
>>>>>>> write UDF ? Any suggestions are appreciated. Thanks
>>>>>>
>>>>>>
>>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>>> <http://www.handy.com/careers>
>>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>>> Handy just raised $50m
>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>>>  led
>>>>>> by Fidelity
>>>>>>
>>>>>>
>>>>
>>

Reply via email to