Re: Right to be forgotten and HDFS

2019-04-15 Thread Ivan Panico
All right but that means migrate everything to Hbase / Kudu ? That also kinda means that GDPR is killing HDFS ? That’s what you are suggesting ? Le lun. 15 avr. 2019 à 22:43, Wei-Chiu Chuang a écrit : > Wow, Chao, didn't realize you guys are making Hudi into Apache :) > HDFS is generally not a

Re: Right to be forgotten and HDFS

2019-04-15 Thread Wei-Chiu Chuang
Wow, Chao, didn't realize you guys are making Hudi into Apache :) HDFS is generally not a good fit for this use case. I've seen people using Kudu for GDPR compliance. On Mon, Apr 15, 2019 at 11:11 AM Chao Sun wrote: > Checkout Hudi (https://github.com/apache/incubator-hudi) which adds > upsert

Re: Right to be forgotten and HDFS

2019-04-15 Thread Chao Sun
Checkout Hudi (https://github.com/apache/incubator-hudi) which adds upsert functionality on top of columnar data such as Parquet. Chao On Mon, Apr 15, 2019 at 10:49 AM Vinod Kumar Vavilapalli wrote: > If one uses HDFS as raw file storage where a single file intermingles data > from all users,

Re: Right to be forgotten and HDFS

2019-04-15 Thread Vinod Kumar Vavilapalli
If one uses HDFS as raw file storage where a single file intermingles data from all users, it's not easy to achieve what you are trying to do. Instead, using systems (e.g. HBase, Hive) that support updates and deletes to individual records is the only way to go. +Vinod > On Apr 15, 2019, at

Right to be forgotten and HDFS

2019-04-15 Thread Ivan Panico
Hi, Recent GDPR introduced a new right for people : the right to be forgotten. This right means that if an organization is asked by a customer to delete all his data, the organization have to comply most of the time (there are conditions which can suspend this right but that's besides my point).