standalone-metastore authorization with S3

Dogukan Wed, 21 Jun 2023 01:09:21 -0700

Hello everyone,



I am having trouble with metadata authorization.



Is there any way we can enforce metadata authorization using
`StorageBasedAuthorizationProvider ` and s3 compatible object storage
(minio in my case) ? Referring to the class documentation, it says
<https://github.com/apache/hive/blob/9a8c0f8b7ae7289d1e5eeddf35360806a9faa38a/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java#L57>:
“StorageBasedAuthorizationProvider is an implementation of
HiveMetastoreAuthorizationProvider that  to look at the *hdfs*
*permissions*...”.
Which sounds like...we can’t ?



My end-goal is to have standalone-metastore as a stateless service deployed
on k8s and work with the metadata using spark and trino. I am using
apache/hive:4.0.0-alpha-2
<https://hub.docker.com/layers/apache/hive/4.0.0-alpha-2/images/sha256-69e482fdcebb9e07610943b610baea996c941bb36814cf233769b8a4db41f9c1?context=explore>
with
this configuration:



```yaml

hive-site.xml:

  hive.metastore.uris: thrift://0.0.0.0:9083

  hive.metastore.warehouse.dir: s3a://hive/warehouse

  hive.metastore.schema.verification: false

  hive.metastore.event.db.notification.api.auth: false

  metastore.expression.proxy:
org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy

  hive.create.as.acid: true

  hive.metastore.try.direct.sql: false

  hive.metastore.try.direct.sql.ddl: false

  hive.metastore.execute.setugi: false

  javax.jdo.option.ConnectionDriverName: org.postgresql.Driver

  javax.jdo.option.ConnectionURL: ...

  javax.jdo.option.ConnectionUserName: …

  javax.jdo.option.ConnectionPassword: …

  hive.metastore.pre.event.listeners:
org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener

  hive.security.metastore.authenticator.manager:
org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator

  hive.security.metastore.authorization.manager:
org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider

  hive.metastore.filter.hook:
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook

  hive.security.metastore.authorization.auth.reads: true

  hive.metastore.authorization.storage.checks: true

core-site.xml:

  fs.s3a.endpoint: ...

  fs.s3a.access.key: ...

  fs.s3a.secret.key: ...

  fs.s3a.path.style.access: true

  fs.s3a.connection.ssl.enabled: true

  fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem

  fs.AbstractFileSystem.file.impl: org.apache.hadoop.fs.s3a.S3A

  fs.defaultFS: s3a://<default bucket>

  # Minio policies are assigned to AD groups' distinguishedNames. So I
thought this part could help but it hasn't :/

  # Username is supplied from spark container.

  hadoop.security.group.mapping:
org.apache.hadoop.security.LdapGroupsMapping

  hadoop.security.group.mapping.ldap.bind.user: ...

  hadoop.security.group.mapping.ldap.bind.password: ...

  hadoop.security.group.mapping.ldap.url: ...

  hadoop.security.group.mapping.ldap.base: ...

  hadoop.security.group.mapping.ldap.search.filter.user:
(&(|(objectclass=person)(objectclass=applicationProcess))(sAMAccountName={0}))

  hadoop.security.group.mapping.ldap.search.filter.group:
(objectclass=group)

  hadoop.security.group.mapping.ldap.search.attr.member: member

  hadoop.security.group.mapping.ldap.search.attr.group.name:
distinguishedName

```



IAM policy assigned to hive:

```json

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Action": [

                "s3:*"

            ],

            "Resource": [

             "arn:aws:s3:::hive",

              "arn:aws:s3:::hive/*"

            ]

        },

        {

            "Effect": "Allow",

            "Action": [

                "admin:ListServiceAccounts",

                "admin:ListUserPolicies",

                "admin:ListUsers",

                "admin:GetGroup",

                "admin:GetPolicy",

                "admin:GetUser",

                "admin:ListGroups"

            ]

        }

    ]

}

```



So far, I can access table metadata and drop tables without having any
privileges on the `hive.metastore.warehouse.dir` bucket. For example:



```python

# these work

spark.sql('show tables')

spark.sql('describe table <table>')

spark.sql('show tblproperties <table')

# this also deletes the data in s3

spark.sql('drop table <table> purge')

# -----------------------------------#

# these fail

spark.sql('create table mytable(id bigint)')

spark.sql('select * from <table>')

```


Thanks,

Doğukan

standalone-metastore authorization with S3

Reply via email to