Interesting...

I have not seen this before at all.
I don't think there is anyway for those things to affect it either -
especially the HA related things.
Those entries are all a single request that will not go to different hosts
- unless there is are redirects involved and then it should become a
separate correlation id.

We should dig into the creation of the id to see what could cause it.


On Tue, Oct 10, 2017 at 4:38 PM, Kevin Risden <[email protected]> wrote:

> Thanks Larry the correlation id looks like it will work for our case.
>
> However, I found an interesting anomaly while looking at our logs (Knox
> 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across
> requests. Is this to be expected? Here is a snippet (anonymized):
>
> grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log
>>
>
>
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-
>> 44e9892372da|audit|WEBHBASE||||access|uri|/gateway/
>> HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request
>> method: GET
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|
>> USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/
>> NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|
>> USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/
>> NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups: []
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|
>> USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/
>> NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|
>> unavailable|Request
>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Cunavailable%7CRequest>
>> method: GET
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|
>> USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/
>> NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|success|Response
>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Csuccess%7CResponse>
>> status: 200
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|
>> USER1|||access|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:
>> TABLE2/multiget?row=ID2%2fd%3araw&|success|Response status: 200
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-
>> 44e9892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP
>> authentication failed.
>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-
>> 44e9892372da|audit|WEBHBASE||||access|uri|/gateway/
>> HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response status:
>> 401
>
>
> The things to highlight here for the same correlation id:
> * different topologies are being used
> * different uris are being used
> * different users are being used
>
>
> Some of the things that we have configured that could impact results:
> * authentication caching
> * multiple Knox servers
> * load balancer in front of Knox
>
> I would be surprised if those things would make one Knox node (it could
> affect our other Knox nodes haven't looked yet) have duplicate correlation
> ids.
>
> On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]>
> wrote:
>
>> If there is no access entry then that definitely needs to be fixed.
>> Please file a JIRA for 0.14.0 if that is the case.
>>
>> On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]>
>> wrote:
>>
>>> Thanks Larry for the quick response. I'll have to go back and double
>>> check if the access and uri are in there. If so I must have missed it the
>>> first time through. Correlating across the GUID should work for our quick
>>> debugging.
>>>
>>> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]> wrote:
>>>
>>>> Hi Kevin -
>>>>
>>>> I would think that you can use the correlation id (GUID) to find the
>>>> url used to access the failed authentication.
>>>> For instance, consider the following:
>>>>
>>>> *17/10/10 10:19:20
>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603
>>>> <(507)%20645-1603>83|unavailable|Request method: GET*
>>>> *17/10/10 10:19:20
>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|*
>>>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd
>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri|
>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r
>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383|
>>>> unavailable|Request method: GET
>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd
>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri|
>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r
>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383|
>>>> success|Response status: 200
>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd
>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri
>>>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag
>>>> e_size=10&fields=Requests&_=1507645160383|success|Response status: 200
>>>>
>>>> All of the above are related to the same request.
>>>> The "access" entry prior to the "authentication" entry contains the URI
>>>> being request with the topology in it.
>>>>
>>>> Does this not meet the needs?
>>>>
>>>> thanks,
>>>>
>>>> --larry
>>>>
>>>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]>
>>>> wrote:
>>>>
>>>>> Background
>>>>> ----------------
>>>>> Currently we have multiple topologies in a single Knox server. Each
>>>>> topology points to a different Hadoop environment. For this example, 
>>>>> assume
>>>>> the topologies are named DEV, TEST, and PROD.
>>>>>
>>>>> We had a group who hits Knox forget to change their LDAP password so
>>>>> there were a bunch of messages like this in the audit logs:
>>>>>
>>>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe
>>>>>> ntication|principal|USERNAME|failure|LDAP authentication failed.
>>>>>
>>>>>
>>>>> We contacted the group and they wanted to know which topology the
>>>>> requests were hitting so they could fix their password. Regardless of if
>>>>> they should have different users per environment or not, we had no way to
>>>>> easily tell the group which topology they were hitting. The LDAP
>>>>> authentication failure log didn't say which topology it was hitting.
>>>>>
>>>>> It would be great if the audit log message was something like this:
>>>>>
>>>>>
>>>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a
>>>>>> uthentication|principal|USERNAME|failure|LDAP authentication failed.
>>>>>>
>>>>>
>>>>> In this case, the topology was added in the audit line maybe near the
>>>>> service name. We think having the topology name on the line somewhere 
>>>>> would
>>>>> be useful for debugging purposes.
>>>>>
>>>>> Question
>>>>> ------------
>>>>> Is it possible to configure Knox to log which topology each line in
>>>>> the audit log came from?
>>>>>
>>>>> I was looking at https://github.com/apache/k
>>>>> nox/blob/master/gateway-util-common/src/main/java/org/apache
>>>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not sure
>>>>> if its possible to easily add the topology there or if it is even the 
>>>>> right
>>>>> place?
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to