Hi,

I'm building an on prem data warehouse with a custom s3 gateway as storage
backend. I was able to deploy a standalone Hive Metastore Server (HMS)
secured by kerberos however now I'm having a hard time figuring out how to
manage authorization.

It seems to me that the storage based authorization layer is not compatible
with s3a since hadoop reports just stub permissions for such "fs". On the
other hand  SQL Standards Based Authorization would force me to restrict
everyone to access the data by means of hiveserver2 and this is not a
viable solution for my use case. At least I would like to have a two-way
access to the data/metadata:

1. using pySpark (mainly to develop ETL/ELT pipelines);
2. using a JDBC/ODBC connector (mainly to feed BI dashboards), for this I
was considering the spark-thrift server but I'm open to hive2 as well;

Am I missing something? Right now the only option I see would be to write a
custom MetastoreAuthorizationProvider that checks s3a permissions either by
querying the bucket ACLs or by performing test read/write/delete actions on
the bucket. Has anyone tried to implement something similar?

Thanks,
Marco

Reply via email to