Yea that is what I was afraid of :) I looked a bit further and it isn't every request that has duplicates. The ones I have found so far are from LDAP authentication failures. There could very well be others since I've only been looking at authentication failures.
I'll see if my team can dig some more information up. I'm hoping to get a few more people involved with Apache Knox. It will probably take a few days to dig further. On Tue, Oct 10, 2017 at 3:57 PM, larry mccay <[email protected]> wrote: > Interesting... > > I have not seen this before at all. > I don't think there is anyway for those things to affect it either - > especially the HA related things. > Those entries are all a single request that will not go to different hosts > - unless there is are redirects involved and then it should become a > separate correlation id. > > We should dig into the creation of the id to see what could cause it. > > > On Tue, Oct 10, 2017 at 4:38 PM, Kevin Risden <[email protected]> wrote: > >> Thanks Larry the correlation id looks like it will work for our case. >> >> However, I found an interesting anomaly while looking at our logs (Knox >> 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across >> requests. Is this to be expected? Here is a snippet (anonymized): >> >> grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log >>> >> >> >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST >>> /hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request method: GET >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/ >>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success| >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/ >>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups: >>> [] >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE >>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1& >>> row=ID2%2Fd%3Araw|unavailable|Request >>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Cunavailable%7CRequest> >>> method: GET >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE >>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1& >>> row=ID2%2Fd%3Araw|success|Response >>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Csuccess%7CResponse> >>> status: 200 >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE|USER1|||access|uri|/gateway/HADOOPPR >>> OD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Response >>> status: 200 >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP >>> authentication failed. >>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST >>> /hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response status: 401 >> >> >> The things to highlight here for the same correlation id: >> * different topologies are being used >> * different uris are being used >> * different users are being used >> >> >> Some of the things that we have configured that could impact results: >> * authentication caching >> * multiple Knox servers >> * load balancer in front of Knox >> >> I would be surprised if those things would make one Knox node (it could >> affect our other Knox nodes haven't looked yet) have duplicate correlation >> ids. >> >> On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]> >> wrote: >> >>> If there is no access entry then that definitely needs to be fixed. >>> Please file a JIRA for 0.14.0 if that is the case. >>> >>> On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]> >>> wrote: >>> >>>> Thanks Larry for the quick response. I'll have to go back and double >>>> check if the access and uri are in there. If so I must have missed it the >>>> first time through. Correlating across the GUID should work for our quick >>>> debugging. >>>> >>>> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]> wrote: >>>> >>>>> Hi Kevin - >>>>> >>>>> I would think that you can use the correlation id (GUID) to find the >>>>> url used to access the failed authentication. >>>>> For instance, consider the following: >>>>> >>>>> *17/10/10 10:19:20 >>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603 >>>>> <(507)%20645-1603>83|unavailable|Request method: GET* >>>>> *17/10/10 10:19:20 >>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|* >>>>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd >>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>>> unavailable|Request method: GET >>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>>> success|Response status: 200 >>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri >>>>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag >>>>> e_size=10&fields=Requests&_=1507645160383|success|Response status: 200 >>>>> >>>>> All of the above are related to the same request. >>>>> The "access" entry prior to the "authentication" entry contains the >>>>> URI being request with the topology in it. >>>>> >>>>> Does this not meet the needs? >>>>> >>>>> thanks, >>>>> >>>>> --larry >>>>> >>>>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]> >>>>> wrote: >>>>> >>>>>> Background >>>>>> ---------------- >>>>>> Currently we have multiple topologies in a single Knox server. Each >>>>>> topology points to a different Hadoop environment. For this example, >>>>>> assume >>>>>> the topologies are named DEV, TEST, and PROD. >>>>>> >>>>>> We had a group who hits Knox forget to change their LDAP password so >>>>>> there were a bunch of messages like this in the audit logs: >>>>>> >>>>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe >>>>>>> ntication|principal|USERNAME|failure|LDAP authentication failed. >>>>>> >>>>>> >>>>>> We contacted the group and they wanted to know which topology the >>>>>> requests were hitting so they could fix their password. Regardless of if >>>>>> they should have different users per environment or not, we had no way to >>>>>> easily tell the group which topology they were hitting. The LDAP >>>>>> authentication failure log didn't say which topology it was hitting. >>>>>> >>>>>> It would be great if the audit log message was something like this: >>>>>> >>>>>> >>>>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a >>>>>>> uthentication|principal|USERNAME|failure|LDAP authentication failed. >>>>>>> >>>>>> >>>>>> In this case, the topology was added in the audit line maybe near the >>>>>> service name. We think having the topology name on the line somewhere >>>>>> would >>>>>> be useful for debugging purposes. >>>>>> >>>>>> Question >>>>>> ------------ >>>>>> Is it possible to configure Knox to log which topology each line in >>>>>> the audit log came from? >>>>>> >>>>>> I was looking at https://github.com/apache/k >>>>>> nox/blob/master/gateway-util-common/src/main/java/org/apache >>>>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not sure >>>>>> if its possible to easily add the topology there or if it is even the >>>>>> right >>>>>> place? >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
