I was poking around in the code and it looks like we have most of the code
in place
<https://github.com/apache/impala/blob/27577dd652554dda5a03016e2d1e3ab66fe6b1f5/common/thrift/CatalogService.thrift#L47>
// Common header included in all CatalogService requests.
// TODO: The CatalogServiceVersion/protocol version should be part of the
header.
// This would require changes in BDR and break their compatibility story.
We should
// coordinate a joint change somewhere down the line.
struct TCatalogServiceRequestHeader {
// The effective user who submitted this request.
1: optional string requesting_user
}
That header is included in all the RPCs. However, that is an optional field
and may not be in a few places (since we don't actually rely on that
currently). So you could start with making it a "required" field and see
what all breaks. HTH.
On Wed, Jan 2, 2019 at 11:35 AM Bharath Vissapragada <[email protected]>
wrote:
> I think we expose it via UDF effective_user() (effective user could be
> different from the connected if delegation/doas is enabled). You can run a
> query like "select effective_user()" in a session.
>
> You can also look it up in the /sessions page on the coordinator web UI
> (<coordinator>:25000/sessions?json) and you can get a json formatted string
> containing the connected and delegate user for each session.
>
> If you want it on the Catalog side, you probably have to plumb it through
> the RPC calls (change the thrift spec and pass it along from the
> coordinator session handling code to the Catalog RPC code).
>
> On Wed, Jan 2, 2019 at 11:19 AM mhd wrk <[email protected]> wrote:
>
>> Is there any Impala/Sentry specific API we can use inside our code to
>> figure out who current user is?
>>
>> On Wed, Jan 2, 2019 at 11:12 AM Bharath Vissapragada <
>> [email protected]> wrote:
>>
>>> Yes. I think Jeszy is right. Per my understanding too, we don't
>>> impersonate the client user on the Catalog server. Instead, we enforce the
>>> authorization via Sentry during query planning.
>>>
>>> On Wed, Jan 2, 2019 at 7:06 AM mhd wrk <[email protected]> wrote:
>>>
>>>> IMPALA-2177 sounds like the correct issue.
>>>> Here are log messages from authentication.cc for impalad and catalogd
>>>> respectively:
>>>>
>>>> I0102 14:15:06.722666 28195 authentication.cc:478] Successfully
>>>>> authenticated client user *"[email protected] <[email protected]>"*
>>>>> I0102 03:40:07.972348 27948 authentication.cc:445] Successfully
>>>>> authenticated principal *"impala/[email protected]
>>>>> <[email protected]>"* on an internal connection
>>>>
>>>>
>>>> As you can see from the messages above, impalad is able to identify the
>>>> currently connected user correctly. However catalogd always authenticates
>>>> as impala which causes the problem.
>>>>
>>>>
>>>> On Wed, Jan 2, 2019 at 4:19 AM Jeszy <[email protected]> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> IIUC your question correctly, this is a limitation. IMPALA-2177 looks
>>>>> to be the appropriate jira.
>>>>> Most users use Impala together with Sentry, where the recommended
>>>>> approach is to disable impersonation (even in services that allow it,
>>>>> like Hive).
>>>>>
>>>>> HTH
>>>>>
>>>>> On Wed, 2 Jan 2019 at 05:55, Bharath Vissapragada <
>>>>> [email protected]> wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > Can you add the stack trace here if possible? It is not super clear
>>>>> where exactly the problem is.
>>>>> >
>>>>> > Thanks,
>>>>> > Bharath
>>>>> >
>>>>> > On Tue, Jan 1, 2019 at 6:34 PM mhd wrk <[email protected]>
>>>>> wrote:
>>>>> >>
>>>>> >> we have our own implementation of Hadoop FileSystem which relies on
>>>>> current user in a kerberosied environment to locate user specific files in
>>>>> HDFS. This custom file system works fine inside hive to create external
>>>>> tables and query them. However trying to access the same tables via Impala
>>>>> (jdbc driver) fails. Watching the log messages seems that when impalad
>>>>> sends requests to catalogd to get meta data of a given table the current
>>>>> user returned by UserGroupInformation is the service account running the
>>>>> server (impala/[email protected]) instead of the currently
>>>>> connected user.
>>>>> >>
>>>>> >> Is this a known issue or limitation of Impala?
>>>>>
>>>>