Thanks Larry the correlation id looks like it will work for our case. However, I found an interesting anomaly while looking at our logs (Knox 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across requests. Is this to be expected? Here is a snippet (anonymized):
grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log > > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request > method: GET > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success| > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups: > [] > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||dispatch|uri| > http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|unavailable|Request > method: GET > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||dispatch|uri| > http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|success|Response > status: 200 > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||access|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Response > status: 200 > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP > authentication failed. > 17/10/10 12:50:09 > ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response > status: 401 The things to highlight here for the same correlation id: * different topologies are being used * different uris are being used * different users are being used Some of the things that we have configured that could impact results: * authentication caching * multiple Knox servers * load balancer in front of Knox I would be surprised if those things would make one Knox node (it could affect our other Knox nodes haven't looked yet) have duplicate correlation ids. On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]> wrote: > If there is no access entry then that definitely needs to be fixed. > Please file a JIRA for 0.14.0 if that is the case. > > On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]> wrote: > >> Thanks Larry for the quick response. I'll have to go back and double >> check if the access and uri are in there. If so I must have missed it the >> first time through. Correlating across the GUID should work for our quick >> debugging. >> >> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]> wrote: >> >>> Hi Kevin - >>> >>> I would think that you can use the correlation id (GUID) to find the url >>> used to access the failed authentication. >>> For instance, consider the following: >>> >>> *17/10/10 10:19:20 >>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603 >>> <(507)%20645-1603>83|unavailable|Request method: GET* >>> *17/10/10 10:19:20 >>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|* >>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd >>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>> unavailable|Request method: GET >>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>> success|Response status: 200 >>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri >>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag >>> e_size=10&fields=Requests&_=1507645160383|success|Response status: 200 >>> >>> All of the above are related to the same request. >>> The "access" entry prior to the "authentication" entry contains the URI >>> being request with the topology in it. >>> >>> Does this not meet the needs? >>> >>> thanks, >>> >>> --larry >>> >>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]> >>> wrote: >>> >>>> Background >>>> ---------------- >>>> Currently we have multiple topologies in a single Knox server. Each >>>> topology points to a different Hadoop environment. For this example, assume >>>> the topologies are named DEV, TEST, and PROD. >>>> >>>> We had a group who hits Knox forget to change their LDAP password so >>>> there were a bunch of messages like this in the audit logs: >>>> >>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe >>>>> ntication|principal|USERNAME|failure|LDAP authentication failed. >>>> >>>> >>>> We contacted the group and they wanted to know which topology the >>>> requests were hitting so they could fix their password. Regardless of if >>>> they should have different users per environment or not, we had no way to >>>> easily tell the group which topology they were hitting. The LDAP >>>> authentication failure log didn't say which topology it was hitting. >>>> >>>> It would be great if the audit log message was something like this: >>>> >>>> >>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a >>>>> uthentication|principal|USERNAME|failure|LDAP authentication failed. >>>>> >>>> >>>> In this case, the topology was added in the audit line maybe near the >>>> service name. We think having the topology name on the line somewhere would >>>> be useful for debugging purposes. >>>> >>>> Question >>>> ------------ >>>> Is it possible to configure Knox to log which topology each line in the >>>> audit log came from? >>>> >>>> I was looking at https://github.com/apache/k >>>> nox/blob/master/gateway-util-common/src/main/java/org/apache >>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not sure >>>> if its possible to easily add the topology there or if it is even the right >>>> place? >>>> >>>> >>>> >>> >> >
