Chris, thank you for expressing the problem in such succinct terms. My
problem does appear to be one of RBAC versus ABAC.

Josh, thanks for your observations.

I will try to summarize my updated understanding of the issue based on
your replies:
At its core, Accumulo appears to encourage ABAC by mandating that the
data be *classified* with visibility labels, and then building RBAC
(if needed) on top of this by authorizing users to access certain
classifications. I guess this model fits well for data that is
amenable for mining classifications (e.g., SSNs, email addresses,
phone numbers). However, in my case, the data I am dealing with is
homogeneous in nature, and the actual classification for visibility
will be performed by examining the value rather than the nature of the
data.  Therefore, I can go with RBAC directly as ABAC will be of
little use to this type of system.

Do let me know if my observation above has any inaccuracies.

As a side note, it did help me a lot to think about visibilities as
*data classifications* rather than visibilities, considering that
there are so many similar-sounding terms in the Accumulo security
model (authentication, permissions, authorization, ...)

Once again, thank you for your help.

Srikanth Viswanathan

On Mon, Feb 16, 2015 at 7:06 PM, Josh Elser <josh.el...@gmail.com> wrote:
> I think A1 is ultimately the right thing, as well.
>
> The problem is not that you don't know how to accurately label your data
> (which is the biggest problem in Accumulo as updating the visibility is very
> costly), it's that it's hard to be able to add your enrichment data after
> the fact.
>
> The reason that's hard, though, is because your enrichment client needs act
> like a client -- have authorizations to read the original data. It seems
> reasonable to me to try to tackle the problem of ensuring the process that
> needs to enrich some data has the appropriate authorizations to read that
> data.
>
> Christopher wrote:
>>
>> I think part of your question pertains to the differences between ABAC
>> (attribute-based access controls) and RBAC (role-based access controls).
>>
>> In both A1 and A2, you're thinking in terms of RBAC. The only real
>> differences is whether you want to have one additional role, or
>> repurpose the existing ones. However, Accumulo's data visibilities are
>> more like ABAC. Of course, you can use whatever method works for you,
>> but the intent is more ABAC than RBAC.
>>
>> The main pitfall with RBAC is that roles and users change, and data is
>> complex and large and you don't want to re-write it when things change.
>> However, attributes are properties of the data itself, upon which you
>> can make access decisions. These attributes should be things that don't
>> change... they are inherent to the data (ideal).
>>
>> To think in terms of ABAC, the main question to ask is "What properties
>> of this data element will determine who can access it?". For example,
>> does it contain personal information or medical history? Does it contain
>> usernames and email addresses? What is it about this data that makes it
>> worth protecting? Does it need to be protected? I think that's mainly
>> what John Vines' talk was about (the differences between RBAC and ABAC).
>>
>> If RBAC is more appropriate for your data, I'd probably go with A1,
>> because it's easier to implement and maintain. The biggest drawback is
>> that you require additional storage space to store the additional role
>> in each visibility. Because of some internal optimizations, if you go
>> this route, I'd recommend making this role a prefix, rather than a
>> suffix "SUPERUSER|(restOfVisibility)" vs. "(restOfVisibility)|SUPERUSER".
>>
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>> On Mon, Feb 16, 2015 at 5:39 PM, Srikanth Viswanathan
>> <srikant...@gmail.com <mailto:srikant...@gmail.com>> wrote:
>>
>>     Hello,
>>
>>     I'm using Accumulo to store raw and value-added data and expose this
>>     data to a small number of end users. During ingestion, the system will
>>     connect to accumulo as a single accumulo user called, say, "ingestor".
>>     This user will first store data, and then later in the ingestion
>>     pipeline read the same data back to add value and write the
>>     value-added data back. End-users will connect as themselves (i.e.,
>>     individual accumulo accounts) to read the data.
>>
>>     The questions I am facing are:
>>     Q1. How to manage the read authorizations for the ingestor?
>>     Q2. How to ensure data in accumulo is never orphaned due to current
>>     users lacking authorizations to read certain columns?
>>
>>     It seems to me that I have two options, both of which will solve both
>>     my problems above:
>>     A1. Grant the ingestor a single authorization and store the data with
>>     labels that allow the ingestor access via this label. e.g.,
>>     "ingestor|(foo_end_user_group|bar_end_user_group)". By doing this, I
>>     don't have to maintain special authorization logic for the ingestor,
>>     and I can also fall back on it to read data that might otherwise be
>>     orphaned.
>>     A2.  Store only the end user groups in the visibility labels
>>     ("foo_end_user_group|bar_end_user_group"), and
>>     force the ingestion user to obtain all group authorizations needed in
>>     order to read the data. This will require special logic to update the
>>     ingestor's authorizations when a new authorization is added to the
>>     system.
>>
>>     A1 seems simpler to me, but I heard John Vines discourage this in his
>>     talk at the 2014 Accumulo Summit.  Doesn't the user in either case see
>>     the same set of data (i.e., "everything"). What then are the potential
>>     pitfalls of A1 compared to A2?
>>
>>     Thank you!
>>
>>     Srikanth Viswanathan
>>
>>
>

Reply via email to