Interesting - authentication failures may short circuit some cleanup
activity on the thread.

On Tue, Oct 10, 2017 at 5:02 PM, Kevin Risden <[email protected]> wrote:

> Yea that is what I was afraid of :) I looked a bit further and it isn't
> every request that has duplicates. The ones I have found so far are from
> LDAP authentication failures. There could very well be others since I've
> only been looking at authentication failures.
>
> I'll see if my team can dig some more information up. I'm hoping to get a
> few more people involved with Apache Knox. It will probably take a few days
> to dig further.
>
> On Tue, Oct 10, 2017 at 3:57 PM, larry mccay <[email protected]> wrote:
>
>> Interesting...
>>
>> I have not seen this before at all.
>> I don't think there is anyway for those things to affect it either -
>> especially the HA related things.
>> Those entries are all a single request that will not go to different
>> hosts - unless there is are redirects involved and then it should become a
>> separate correlation id.
>>
>> We should dig into the creation of the id to see what could cause it.
>>
>>
>> On Tue, Oct 10, 2017 at 4:38 PM, Kevin Risden <[email protected]> wrote:
>>
>>> Thanks Larry the correlation id looks like it will work for our case.
>>>
>>> However, I found an interesting anomaly while looking at our logs (Knox
>>> 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across
>>> requests. Is this to be expected? Here is a snippet (anonymized):
>>>
>>> grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log
>>>>
>>>
>>>
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/
>>>> hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request method: GET
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/
>>>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/
>>>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups:
>>>> []
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE
>>>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=
>>>> ID2%2Fd%3Araw|unavailable|Request
>>>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Cunavailable%7CRequest>
>>>> method: GET
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE
>>>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=
>>>> ID2%2Fd%3Araw|success|Response
>>>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Csuccess%7CResponse>
>>>> status: 200
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE|USER1|||access|uri|/gateway/HADOOPPR
>>>> OD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Response
>>>> status: 200
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP
>>>> authentication failed.
>>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9
>>>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/
>>>> hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response status: 401
>>>
>>>
>>> The things to highlight here for the same correlation id:
>>> * different topologies are being used
>>> * different uris are being used
>>> * different users are being used
>>>
>>>
>>> Some of the things that we have configured that could impact results:
>>> * authentication caching
>>> * multiple Knox servers
>>> * load balancer in front of Knox
>>>
>>> I would be surprised if those things would make one Knox node (it could
>>> affect our other Knox nodes haven't looked yet) have duplicate correlation
>>> ids.
>>>
>>> On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]>
>>> wrote:
>>>
>>>> If there is no access entry then that definitely needs to be fixed.
>>>> Please file a JIRA for 0.14.0 if that is the case.
>>>>
>>>> On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Larry for the quick response. I'll have to go back and double
>>>>> check if the access and uri are in there. If so I must have missed it the
>>>>> first time through. Correlating across the GUID should work for our quick
>>>>> debugging.
>>>>>
>>>>> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Kevin -
>>>>>>
>>>>>> I would think that you can use the correlation id (GUID) to find the
>>>>>> url used to access the failed authentication.
>>>>>> For instance, consider the following:
>>>>>>
>>>>>> *17/10/10 10:19:20
>>>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603
>>>>>> <(507)%20645-1603>83|unavailable|Request method: GET*
>>>>>> *17/10/10 10:19:20
>>>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|*
>>>>>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd
>>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri|
>>>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r
>>>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383|
>>>>>> unavailable|Request method: GET
>>>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd
>>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri|
>>>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r
>>>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383|
>>>>>> success|Response status: 200
>>>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd
>>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri
>>>>>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag
>>>>>> e_size=10&fields=Requests&_=1507645160383|success|Response status:
>>>>>> 200
>>>>>>
>>>>>> All of the above are related to the same request.
>>>>>> The "access" entry prior to the "authentication" entry contains the
>>>>>> URI being request with the topology in it.
>>>>>>
>>>>>> Does this not meet the needs?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> --larry
>>>>>>
>>>>>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Background
>>>>>>> ----------------
>>>>>>> Currently we have multiple topologies in a single Knox server. Each
>>>>>>> topology points to a different Hadoop environment. For this example, 
>>>>>>> assume
>>>>>>> the topologies are named DEV, TEST, and PROD.
>>>>>>>
>>>>>>> We had a group who hits Knox forget to change their LDAP password so
>>>>>>> there were a bunch of messages like this in the audit logs:
>>>>>>>
>>>>>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe
>>>>>>>> ntication|principal|USERNAME|failure|LDAP authentication failed.
>>>>>>>
>>>>>>>
>>>>>>> We contacted the group and they wanted to know which topology the
>>>>>>> requests were hitting so they could fix their password. Regardless of if
>>>>>>> they should have different users per environment or not, we had no way 
>>>>>>> to
>>>>>>> easily tell the group which topology they were hitting. The LDAP
>>>>>>> authentication failure log didn't say which topology it was hitting.
>>>>>>>
>>>>>>> It would be great if the audit log message was something like this:
>>>>>>>
>>>>>>>
>>>>>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a
>>>>>>>> uthentication|principal|USERNAME|failure|LDAP authentication
>>>>>>>> failed.
>>>>>>>>
>>>>>>>
>>>>>>> In this case, the topology was added in the audit line maybe near
>>>>>>> the service name. We think having the topology name on the line 
>>>>>>> somewhere
>>>>>>> would be useful for debugging purposes.
>>>>>>>
>>>>>>> Question
>>>>>>> ------------
>>>>>>> Is it possible to configure Knox to log which topology each line in
>>>>>>> the audit log came from?
>>>>>>>
>>>>>>> I was looking at https://github.com/apache/k
>>>>>>> nox/blob/master/gateway-util-common/src/main/java/org/apache
>>>>>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not
>>>>>>> sure if its possible to easily add the topology there or if it is even 
>>>>>>> the
>>>>>>> right place?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to