Checkout Hudi (https://github.com/apache/incubator-hudi) which adds upsert
functionality on top of columnar data such as Parquet.

Chao

On Mon, Apr 15, 2019 at 10:49 AM Vinod Kumar Vavilapalli <vino...@apache.org>
wrote:

> If one uses HDFS as raw file storage where a single file intermingles data
> from all users, it's not easy to achieve what you are trying to do.
>
> Instead, using systems (e.g. HBase, Hive) that support updates and deletes
> to individual records is the only way to go.
>
> +Vinod
>
> On Apr 15, 2019, at 1:32 AM, Ivan Panico <iv.pan...@gmail.com> wrote:
>
> Hi,
>
> Recent GDPR introduced a new right for people : the right to be forgotten.
> This right means that if an organization is asked by a customer to delete
> all his data, the organization have to comply most of the time (there are
> conditions which can suspend this right but that's besides my point).
>
> Now HDFS being WORM (Write Once Read Multpliple Times), I guess you see
> where I'm going. What would be the best way to implement this line deletion
> feature (supposing that when a customer asks for a delete of all his data,
> the organization would have to delete some lines in some HDFS files).
>
> Right now I'm going for the following :
>
>    - Create a key-value base (user, [files])
>    - On file writing, feed this base with the users and file location (by
>    appending or updating a key).
>    - When the deletion is requested by the user "john", look in that base
>    and rewrite all the files of the "john" key (read the file in memmory,
>    suppress the lines of "john", rewrite the files)
>
>
> Would this be the most hadoop way to do that ?
> I discarded some cryptoshredding like solution because the HDFS data has
> to be readable by some mutliple proprietary softwares and by users at some
> point and I'm not sur how to incorporate a decyphering step for all those
> uses cases.
> Also, I came up with this table solution because a violent grep for some
> key on the whole HDFS tree seemed unlikely to scale but maybe I'm mistaken ?
>
> Thanks for your help,
> Best regards
>
>
>

Reply via email to