Thanks for further info. Not sure if our Product Management is OK, at this point, with us patching Impala server to get our solution working. Our product is supposed to work with already installed servers.
Any plans to address the gap (making requesting_user visible inside catalog server) in future release? On Wed, Jan 2, 2019 at 11:50 AM Bharath Vissapragada <[email protected]> wrote: > I was poking around in the code and it looks like we have most of the code > in place > <https://github.com/apache/impala/blob/27577dd652554dda5a03016e2d1e3ab66fe6b1f5/common/thrift/CatalogService.thrift#L47> > > // Common header included in all CatalogService requests. > // TODO: The CatalogServiceVersion/protocol version should be part of the > header. > // This would require changes in BDR and break their compatibility story. > We should > // coordinate a joint change somewhere down the line. > struct TCatalogServiceRequestHeader { > // The effective user who submitted this request. > 1: optional string requesting_user > } > > That header is included in all the RPCs. However, that is an optional > field and may not be in a few places (since we don't actually rely on that > currently). So you could start with making it a "required" field and see > what all breaks. HTH. > > On Wed, Jan 2, 2019 at 11:35 AM Bharath Vissapragada < > [email protected]> wrote: > >> I think we expose it via UDF effective_user() (effective user could be >> different from the connected if delegation/doas is enabled). You can run a >> query like "select effective_user()" in a session. >> >> You can also look it up in the /sessions page on the coordinator web UI >> (<coordinator>:25000/sessions?json) and you can get a json formatted string >> containing the connected and delegate user for each session. >> >> If you want it on the Catalog side, you probably have to plumb it through >> the RPC calls (change the thrift spec and pass it along from the >> coordinator session handling code to the Catalog RPC code). >> >> On Wed, Jan 2, 2019 at 11:19 AM mhd wrk <[email protected]> wrote: >> >>> Is there any Impala/Sentry specific API we can use inside our code to >>> figure out who current user is? >>> >>> On Wed, Jan 2, 2019 at 11:12 AM Bharath Vissapragada < >>> [email protected]> wrote: >>> >>>> Yes. I think Jeszy is right. Per my understanding too, we don't >>>> impersonate the client user on the Catalog server. Instead, we enforce the >>>> authorization via Sentry during query planning. >>>> >>>> On Wed, Jan 2, 2019 at 7:06 AM mhd wrk <[email protected]> wrote: >>>> >>>>> IMPALA-2177 sounds like the correct issue. >>>>> Here are log messages from authentication.cc for impalad and catalogd >>>>> respectively: >>>>> >>>>> I0102 14:15:06.722666 28195 authentication.cc:478] Successfully >>>>>> authenticated client user *"[email protected] <[email protected]>"* >>>>>> I0102 03:40:07.972348 27948 authentication.cc:445] Successfully >>>>>> authenticated principal *"impala/[email protected] >>>>>> <[email protected]>"* on an internal connection >>>>> >>>>> >>>>> As you can see from the messages above, impalad is able to identify >>>>> the currently connected user correctly. However catalogd always >>>>> authenticates as impala which causes the problem. >>>>> >>>>> >>>>> On Wed, Jan 2, 2019 at 4:19 AM Jeszy <[email protected]> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> IIUC your question correctly, this is a limitation. IMPALA-2177 looks >>>>>> to be the appropriate jira. >>>>>> Most users use Impala together with Sentry, where the recommended >>>>>> approach is to disable impersonation (even in services that allow it, >>>>>> like Hive). >>>>>> >>>>>> HTH >>>>>> >>>>>> On Wed, 2 Jan 2019 at 05:55, Bharath Vissapragada < >>>>>> [email protected]> wrote: >>>>>> > >>>>>> > Hi, >>>>>> > >>>>>> > Can you add the stack trace here if possible? It is not super clear >>>>>> where exactly the problem is. >>>>>> > >>>>>> > Thanks, >>>>>> > Bharath >>>>>> > >>>>>> > On Tue, Jan 1, 2019 at 6:34 PM mhd wrk <[email protected]> >>>>>> wrote: >>>>>> >> >>>>>> >> we have our own implementation of Hadoop FileSystem which relies >>>>>> on current user in a kerberosied environment to locate user specific >>>>>> files >>>>>> in HDFS. This custom file system works fine inside hive to create >>>>>> external >>>>>> tables and query them. However trying to access the same tables via >>>>>> Impala >>>>>> (jdbc driver) fails. Watching the log messages seems that when impalad >>>>>> sends requests to catalogd to get meta data of a given table the current >>>>>> user returned by UserGroupInformation is the service account running the >>>>>> server (impala/[email protected]) instead of the currently >>>>>> connected user. >>>>>> >> >>>>>> >> Is this a known issue or limitation of Impala? >>>>>> >>>>>
