Hi,

What are the best mechanisms of hiding data destined for Hive tables.

Let us assume that we are loading tons of CSV files into Hive.

The way I do it is:

--1 Move .CSV data into HDFS staging area
--2 Create an external table.
--3 Create the ORC table if needed
--4 Insert or append the data from the external table to the Hive ORC table
--5 Remove CSV files from staging area

Within process 1 to 5 (that may take a good while), sensitive data residing
on HDFS can be exposed. I would be interested to know possible solutions to
this potential security breach.

Thanks,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to