Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-16 Thread Derek Chen-Becker
My vote is B, but I think you should go ahead with the actual vote thread. Cheers, Derek On Fri, Sep 16, 2022 at 4:05 AM Andrés de la Peña wrote: > It's been 9 days since we started the poll, and we haven't had any new > vote since Monday. So we are still on 5 votes for A and 2 votes for B. >

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-16 Thread Andrés de la Peña
It's been 9 days since we started the poll, and we haven't had any new vote since Monday. So we are still on 5 votes for A and 2 votes for B. The poll results doesn't seem to oppose the CEP. If no one has anything else to add, I'll start the actual vote thread. On Tue, 13 Sept 2022 at 15:05,

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-13 Thread Andrés de la Peña
That's 5 votes for A and 2 votes for B so far. None of these options opposes to the CEP, so I think we can probably start the vote, unless we want to wait longer for the poll. On Mon, 12 Sept 2022 at 13:51, Benjamin Lerer wrote: > A > > Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan > a écrit

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-12 Thread Benjamin Lerer
A Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan a écrit : > A > > On Sep 7, 2022, at 8:58 AM, Benedict wrote: > > Well, I am not convinced these changes will materially impact the outcome, > but at least we’ll have some extra fun collating the votes. > > > On 7 Sep 2022, at 14:05, Andrés de

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Jeremiah D Jordan
A > On Sep 7, 2022, at 8:58 AM, Benedict wrote: > > Well, I am not convinced these changes will materially impact the outcome, > but at least we’ll have some extra fun collating the votes. > > >> On 7 Sep 2022, at 14:05, Andrés de la Peña wrote: >> >>  >> The poll makes sense to me. I

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Benedict
Well, I am not convinced these changes will materially impact the outcome, but at least we’ll have some extra fun collating the votes. > On 7 Sep 2022, at 14:05, Andrés de la Peña wrote: > >  > The poll makes sense to me. I would slightly change it to: > > A) We shouldn't prefer neither

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Berenguer Blasi
A. I agree the implementor's preference is an important aspect to take into account. On 7/9/22 15:23, Ekaterina Dimitrova wrote: A On Wed, 7 Sep 2022 at 9:05, Andrés de la Peña wrote: The poll makes sense to me. I would slightly change it to: A) We shouldn't prefer neither

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Ekaterina Dimitrova
A On Wed, 7 Sep 2022 at 9:05, Andrés de la Peña wrote: > The poll makes sense to me. I would slightly change it to: > > A) We shouldn't prefer neither approach, and I agree to the implementor > selecting the table schema approach for this CEP > B) We should prefer the view approach, but I am

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Andrés de la Peña
The poll makes sense to me. I would slightly change it to: A) We shouldn't prefer neither approach, and I agree to the implementor selecting the table schema approach for this CEP B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Claude Warren via dev
My vote is B On 07/09/2022 13:12, Benedict wrote: I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Benedict
I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is broadly adopted by other databases. I will note

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Andrés de la Peña
If nobody has more concerns regarding the CEP I will start the vote tomorrow. On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña wrote: > Is there enough support here for VIEWS to be the implementation strategy >> for displaying masking functions? > > > I'm not sure that views should be "the"

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-31 Thread Andrés de la Peña
> > Is there enough support here for VIEWS to be the implementation strategy > for displaying masking functions? I'm not sure that views should be "the" strategy for masking functions. We have multiple approaches here: 1) CQL functions only. Users can decide to use the masking functions on

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-31 Thread Claude Warren via dev
Is there enough support here for VIEWS to be the implementation strategy for displaying masking functions? It seems to me the view would have to store the query and apply a where clause to it, so the same PK would be in play. It has data leaking properties. It has more use cases as it can

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Benedict
Not to push the point too strongly (I don’t have a very firm view of my own), but if we provide this via a view feature we’re just implementing one new feature and we get masking for free. I don’t think it is materially more complicated than redefining columns for users - it might even be less

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Andrés de la Peña
> > GRANT SELECT ON foo.unmasked_name TO top_secret; Note that Cassandra doesn't have support for column-level permissions. There was an initiative to add them in 2016, CASSANDRA-12859 . However, the ticket has been inactive since 2017. The

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-30 Thread Avi Kivity via dev
Agree with views, or alternatively, column permissions together with computed columns: CREATE TABLE foo (   id int PRIMARY KEY,   unmasked_name text,   name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7) ) (syntax from postgresql) GRANT SELECT ON foo.name TO general_use;

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-26 Thread Andrés de la Peña
> > Yes, I was thinking that simple projection views (essentially a SELECT > statement with application of transform functions) would complement masking > functions, and from the discussion it sounds like this is basically what > some of the other databases do. I don't see that the mentioned

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-26 Thread Benjamin Lerer
Views (even only projection view) is a completely new feature with its own set of complexities and limitations. My first feeling is that it might not be as simple as it sounds. There are an important amount of use cases to cover. It will definitely require its own CEP. :-) I like Andrés'

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Derek Chen-Becker
Yes, I was thinking that simple projection views (essentially a SELECT statement with application of transform functions) would complement masking functions, and from the discussion it sounds like this is basically what some of the other databases do. Projection views seem like they would be

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
> > Perhaps we could deliver simple views backed by virtual tables, and model > our approach on that of Postgres, MySQL et al? The approach of PostgresSQL allows attaching masking functions to columns and users with

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Benedict
I’m inclined to agree that this seems a more straightforward approach that makes fewer implied promises. Perhaps we could deliver simple views backed by virtual tables, and model our approach on that of Postgres, MySQL et al? Views in C* would be very simple, just offering a subset of fields

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
Note that conditional updates return true or false to notify whether the update has happened or not. That can also be exploited to infer the masked data. Indeed, at the moment they also require SELECT permissions. The masking functions can always be used on their own, as any other CQL function

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Derek Chen-Becker
To make sure I understand, if I wanted to use a masked column for a conditional update, you're saying we would need SELECT_MASKED to use it in the IF clause? I worry that this proposal is increasing in complexity; I would actually be OK starting with something smaller in scope. Perhaps just

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
I have modified the proposal adding a new SELECT_MASKED permission. Using masked columns on WHERE/IF clauses would require having SELECT and either UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the query results would always require both SELECT and UNMASK. This way we can

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Henrik Ingo
This is the difference between security and compliance I guess :-D The way I see this, the attacker or threat in this concept is not the developer with access to the database. Rather a feature like this is just a convenient way to apply some masking rule in a centralized way. The protection is

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benedict
The MySQL feature is not equivalent to this proposal, it simply offers new transformation functions that implement this functionality, so it is up to the application to apply these functions to its own selects or, as most examples seem to use, to create a view on the data that applies the

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Andrés de la Peña
Where does MySQL suggest that? As far I can tell MySQL only offers a set of functions for masking. I can't see a way to force users or tables to use those functions, and is up to the users to use those functions or not. I'm reading this documentation

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benedict
I can’t tell for sure, but the documentation on Postgres’ feature suggests to me that it does apply the masking to all possible uses of the data, including joining and querying. Snowflake’s documentation explicitly says that it does. MySQL’s documentation suggests that it does this. Oracle,

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Andrés de la Peña
Here are the names of the feature on same databases out there, errors and omission excepted: - Microsoft SQL Server / Azure SQL: Dynamic data masking - MySQL: Enterprise data masking and de-identification - PostgreSQL: Dynamic masking - MongoDB: Data masking - IBM Db2: Masks -

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Andrés de la Peña
> > Is it typical for a masking feature to make no effort to prevent > unmasking? I’m just struggling to see the value of this without such > mechanisms. Otherwise it’s just a default formatter, and we should consider > renaming the feature IMO I'd say it's a pretty standard feature. There are

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benedict
Right, but we get to decide how we offer such features and what we call them. I can’t imagine a good reason to call this a masking feature, especially one that applies differentially to certain users, when it is trivial to unmask. I’m ok offering a feature called “default formatter” or

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benjamin Lerer
> > The PCI DSS Standard v4_0 > > requires > that credit card numbers stored on the system must be "rendered > unreadable", thus this proposal is _NOT_ a good way to protect credit card > numbers. My point was

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Claude Warren, Jr via dev
This change appears to be looking at two aspects: 1. Add metadata to columns 2. Add functionality based on the metadata. If the system had a generic user defined metadata and the ability to define filter functions at the point where data are being returned to the client it would be

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Claude Warren, Jr via dev
The PCI DSS Standard v4_0 requires that credit card numbers stored on the system must be "rendered unreadable", thus this proposal is _NOT_ a good way to protect credit card numbers. In fact, for any critically

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benjamin Lerer
> > Is it typical for a masking feature to make no effort to prevent > unmasking? I’m just struggling to see the value of this without such > mechanisms. Otherwise it’s just a default formatter, and we should consider > renaming the feature IMO The security that Dynamic Data Masking is bringing

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Claude Warren, Jr via dev
This seems to me to be a client display filter, applied at the last moment as data are streaming back to the client. It has no impact on any keys, queries or secondary internal index or materialized view. It simply prevents the display from showing the complete value. It does not preclude

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Benedict
Is it typical for a masking feature to make no effort to prevent unmasking? I’m just struggling to see the value of this without such mechanisms. Otherwise it’s just a default formatter, and we should consider renaming the feature IMO > On 23 Aug 2022, at 21:27, Andrés de la Peña wrote: > > 

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Andrés de la Peña
As mentioned in the CEP document, dynamic data masking doesn't try to prevent malicious users with SELECT permissions to indirectly guess the real value of the masked value. This can easily be done by just trying values on the WHERE clause of SELECT queries. DDM would not be a replacement for

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Aaron Ploetz
> > Applying this should prevent querying on a field, else you could leak its > contents, surely? > In theory, yes. Although I could see folks doing something like this: SELECT COUNT(*) FROM patients WHERE year_of_birth = 2002 AND date_of_birth >= '2002-04-01' AND date_of_birth < '2002-11-01';

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Derek Chen-Becker
Agreed on not being a queryable field. That would also preclude secondary indexing, right? On Tue, Aug 23, 2022 at 11:20 AM Benedict wrote: > Applying this should prevent querying on a field, else you could leak its > contents, surely? This pretty much prohibits using it in a clustering key, >

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Benedict
Applying this should prevent querying on a field, else you could leak its contents, surely? This pretty much prohibits using it in a clustering key, and a partition key with the ordered partitioner - but probably also a hashed partitioner since we do not use a cryptographic hash and the hash

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Aaron Ploetz
Some thoughts on this one: In a prior job, we'd give app teams access to a single keyspace, and two roles: a read-write role and a read-only role. In some cases, a "privileged" application role was also requested. Depending on the requirements, I could see the UNMASK permission being applied to

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Henrik Ingo
On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña wrote: > One thought: The way the CEP is currently written, it is only possible to >> mask a column one way. You can only define one masking function for a >> column, and since you use the original column name, you could only return >> one

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Andrés de la Peña
> > One thought: The way the CEP is currently written, it is only possible to > mask a column one way. You can only define one masking function for a > column, and since you use the original column name, you could only return > one version of it in the result set, even if you had a way to define >

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-22 Thread Henrik Ingo
One thought: The way the CEP is currently written, it is only possible to mask a column one way. You can only define one masking function for a column, and since you use the original column name, you could only return one version of it in the result set, even if you had a way to define several

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-22 Thread Andrés de la Peña
> > Isn't there an assumption here that encryption can not be used? Would we > not be better served to build in an encryption strategy that keeps the data > encrypted until the user shows permissions to decrypt, like the unmask > property? An encryption strategy that can work within the

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-22 Thread Andrés de la Peña
> > Maybe a small improvement is the redacted value could be of the form > `XXX1...1000` meaning XXX followed by a rand number from 1 to 1000: XXX54, > XXX998, XXX456,... Some randomness would prevent some apps flattening all > rows to a single XXX'ed one, giving a more realistic redacted data >

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-22 Thread Claude Warren, Jr via dev
I am more interested in the motivation where it is stated: Many users have the need of masking sensitive data, such as contact info, > age, gender, credit card numbers, etc. Dynamic data masking (DDM) allows to > obscure sensitive information while still allowing access to the masked > columns,

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-21 Thread Berenguer Blasi
Maybe a small improvement is the redacted value could be of the form `XXX1...1000` meaning XXX followed by a rand number from 1 to 1000: XXX54, XXX998, XXX456,... Some randomness would prevent some apps flattening all rows to a single XXX'ed one, giving a more realistic redacted data

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-21 Thread Andrés de la Peña
> > > If the column names are the same for masked and unmasked data, it would > impact existing applications. I am curious what the transition plan look > like for applications that expect unmasked data? For example, let’s say you store SSNs and Birth dates. Upon enabling this > feature, let’s

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Andrés de la Peña
> > > This type of feature is very useful, but it may be easier to analyze > this proposal if it’s compared with other DDM implementations from other > databases? Would it be reasonable to add a table to the proposal comparing > syntax and output from eg Azure SQL vs Cassandra vs whatever ? Good

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Jeff Jirsa
This type of feature is very useful, but it may be easier to analyze this proposal if it’s compared with other DDM implementations from other databases? Would it be reasonable to add a table to the proposal comparing syntax and output from eg Azure SQL vs Cassandra vs whatever ? > On Aug 19,

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Dinesh Joshi
sounds interesting. I would like to understand a couple things here. If the column names are the same for masked and unmasked data, it would impact existing applications. I am curious what the transition plan look like for applications that expect unmasked data? For example, let’s say you

[DISCUSS] CEP-20: Dynamic Data Masking

2022-08-19 Thread Andrés de la Peña
Hi everyone, I'd like to start a discussion about this proposal for dynamic data masking: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking Dynamic data masking allows to obscure sensitive information without changing the stored data. It would be based on a set