Interesting... I have not seen this before at all. I don't think there is anyway for those things to affect it either - especially the HA related things. Those entries are all a single request that will not go to different hosts - unless there is are redirects involved and then it should become a separate correlation id.
We should dig into the creation of the id to see what could cause it. On Tue, Oct 10, 2017 at 4:38 PM, Kevin Risden <[email protected]> wrote: > Thanks Larry the correlation id looks like it will work for our case. > > However, I found an interesting anomaly while looking at our logs (Knox > 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across > requests. Is this to be expected? Here is a snippet (anonymized): > > grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log >> > > >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc- >> 44e9892372da|audit|WEBHBASE||||access|uri|/gateway/ >> HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request >> method: GET >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE| >> USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/ >> NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success| >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE| >> USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/ >> NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups: [] >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE| >> USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/ >> NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw| >> unavailable|Request >> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Cunavailable%7CRequest> >> method: GET >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE| >> USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/ >> NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|success|Response >> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Csuccess%7CResponse> >> status: 200 >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE| >> USER1|||access|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2: >> TABLE2/multiget?row=ID2%2fd%3araw&|success|Response status: 200 >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc- >> 44e9892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP >> authentication failed. >> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc- >> 44e9892372da|audit|WEBHBASE||||access|uri|/gateway/ >> HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response status: >> 401 > > > The things to highlight here for the same correlation id: > * different topologies are being used > * different uris are being used > * different users are being used > > > Some of the things that we have configured that could impact results: > * authentication caching > * multiple Knox servers > * load balancer in front of Knox > > I would be surprised if those things would make one Knox node (it could > affect our other Knox nodes haven't looked yet) have duplicate correlation > ids. > > On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]> > wrote: > >> If there is no access entry then that definitely needs to be fixed. >> Please file a JIRA for 0.14.0 if that is the case. >> >> On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]> >> wrote: >> >>> Thanks Larry for the quick response. I'll have to go back and double >>> check if the access and uri are in there. If so I must have missed it the >>> first time through. Correlating across the GUID should work for our quick >>> debugging. >>> >>> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]> wrote: >>> >>>> Hi Kevin - >>>> >>>> I would think that you can use the correlation id (GUID) to find the >>>> url used to access the failed authentication. >>>> For instance, consider the following: >>>> >>>> *17/10/10 10:19:20 >>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603 >>>> <(507)%20645-1603>83|unavailable|Request method: GET* >>>> *17/10/10 10:19:20 >>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|* >>>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd >>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>> unavailable|Request method: GET >>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>> success|Response status: 200 >>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri >>>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag >>>> e_size=10&fields=Requests&_=1507645160383|success|Response status: 200 >>>> >>>> All of the above are related to the same request. >>>> The "access" entry prior to the "authentication" entry contains the URI >>>> being request with the topology in it. >>>> >>>> Does this not meet the needs? >>>> >>>> thanks, >>>> >>>> --larry >>>> >>>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]> >>>> wrote: >>>> >>>>> Background >>>>> ---------------- >>>>> Currently we have multiple topologies in a single Knox server. Each >>>>> topology points to a different Hadoop environment. For this example, >>>>> assume >>>>> the topologies are named DEV, TEST, and PROD. >>>>> >>>>> We had a group who hits Knox forget to change their LDAP password so >>>>> there were a bunch of messages like this in the audit logs: >>>>> >>>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe >>>>>> ntication|principal|USERNAME|failure|LDAP authentication failed. >>>>> >>>>> >>>>> We contacted the group and they wanted to know which topology the >>>>> requests were hitting so they could fix their password. Regardless of if >>>>> they should have different users per environment or not, we had no way to >>>>> easily tell the group which topology they were hitting. The LDAP >>>>> authentication failure log didn't say which topology it was hitting. >>>>> >>>>> It would be great if the audit log message was something like this: >>>>> >>>>> >>>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a >>>>>> uthentication|principal|USERNAME|failure|LDAP authentication failed. >>>>>> >>>>> >>>>> In this case, the topology was added in the audit line maybe near the >>>>> service name. We think having the topology name on the line somewhere >>>>> would >>>>> be useful for debugging purposes. >>>>> >>>>> Question >>>>> ------------ >>>>> Is it possible to configure Knox to log which topology each line in >>>>> the audit log came from? >>>>> >>>>> I was looking at https://github.com/apache/k >>>>> nox/blob/master/gateway-util-common/src/main/java/org/apache >>>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not sure >>>>> if its possible to easily add the topology there or if it is even the >>>>> right >>>>> place? >>>>> >>>>> >>>>> >>>> >>> >> >
