Hi 

Need some advise on how to implement following use case.

I read dataset which is 1+ TB in size, this has 1000+ columns.

Only 3 columns out of these 1000+ columns contain PII information and I need to 
call Google DLP API.

I want to select only 3 columns out of these 1000+ columns and submit only 
these 3 columns to DLP API. Once I get the results back from DLP, I want to 
change these 3 columns in my original data set.

I dont have any UUID for each row, so I will not be able to join original data 
(1000+ columns) with another data (3 columns). 

Any suggestions how to implement it.

Thanks
Aniruddh

Reply via email to