Interesting - authentication failures may short circuit some cleanup activity on the thread.
On Tue, Oct 10, 2017 at 5:02 PM, Kevin Risden <[email protected]> wrote: > Yea that is what I was afraid of :) I looked a bit further and it isn't > every request that has duplicates. The ones I have found so far are from > LDAP authentication failures. There could very well be others since I've > only been looking at authentication failures. > > I'll see if my team can dig some more information up. I'm hoping to get a > few more people involved with Apache Knox. It will probably take a few days > to dig further. > > On Tue, Oct 10, 2017 at 3:57 PM, larry mccay <[email protected]> wrote: > >> Interesting... >> >> I have not seen this before at all. >> I don't think there is anyway for those things to affect it either - >> especially the HA related things. >> Those entries are all a single request that will not go to different >> hosts - unless there is are redirects involved and then it should become a >> separate correlation id. >> >> We should dig into the creation of the id to see what could cause it. >> >> >> On Tue, Oct 10, 2017 at 4:38 PM, Kevin Risden <[email protected]> wrote: >> >>> Thanks Larry the correlation id looks like it will work for our case. >>> >>> However, I found an interesting anomaly while looking at our logs (Knox >>> 0.9 on HDP 2.5 btw). The "correlation id" doesn't seem to be unique across >>> requests. Is this to be expected? Here is a snippet (anonymized): >>> >>> grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log >>>> >>> >>> >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/ >>>> hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request method: GET >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/ >>>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success| >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/ >>>> HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups: >>>> [] >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE >>>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row= >>>> ID2%2Fd%3Araw|unavailable|Request >>>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Cunavailable%7CRequest> >>>> method: GET >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE >>>> .example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row= >>>> ID2%2Fd%3Araw|success|Response >>>> <http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw%7Csuccess%7CResponse> >>>> status: 200 >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE|USER1|||access|uri|/gateway/HADOOPPR >>>> OD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Response >>>> status: 200 >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP >>>> authentication failed. >>>> 17/10/10 12:50:09 ||7557c91b-2a48-4e09-aefc-44e9 >>>> 892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/ >>>> hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response status: 401 >>> >>> >>> The things to highlight here for the same correlation id: >>> * different topologies are being used >>> * different uris are being used >>> * different users are being used >>> >>> >>> Some of the things that we have configured that could impact results: >>> * authentication caching >>> * multiple Knox servers >>> * load balancer in front of Knox >>> >>> I would be surprised if those things would make one Knox node (it could >>> affect our other Knox nodes haven't looked yet) have duplicate correlation >>> ids. >>> >>> On Tue, Oct 10, 2017 at 9:54 AM, larry mccay <[email protected]> >>> wrote: >>> >>>> If there is no access entry then that definitely needs to be fixed. >>>> Please file a JIRA for 0.14.0 if that is the case. >>>> >>>> On Tue, Oct 10, 2017 at 10:47 AM, Kevin Risden <[email protected]> >>>> wrote: >>>> >>>>> Thanks Larry for the quick response. I'll have to go back and double >>>>> check if the access and uri are in there. If so I must have missed it the >>>>> first time through. Correlating across the GUID should work for our quick >>>>> debugging. >>>>> >>>>> On Tue, Oct 10, 2017 at 9:23 AM, larry mccay <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Kevin - >>>>>> >>>>>> I would think that you can use the correlation id (GUID) to find the >>>>>> url used to access the failed authentication. >>>>>> For instance, consider the following: >>>>>> >>>>>> *17/10/10 10:19:20 >>>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI||||access|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=15076451603 >>>>>> <(507)%20645-1603>83|unavailable|Request method: GET* >>>>>> *17/10/10 10:19:20 >>>>>> ||a7bae089-e52d-403e-a623-fddde5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||authentication|uri|/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&page_size=10&fields=Requests&_=1507645160383|success|* >>>>>> *17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-*fddd >>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>>>> unavailable|Request method: GET >>>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||dispatch|uri| >>>>>> http://c6401.ambari.apache.org:8080/api/v1/clusters/ljm/r >>>>>> equests?to=end&fields=Requests&page_size=10&_=1507645160383| >>>>>> success|Response status: 200 >>>>>> 17/10/10 10:19:20 ||a7bae089-e52d-403e-a623-fddd >>>>>> e5c15c67|audit|0:0:0:0:0:0:0:1|AMBARI|anonymous|||access|uri >>>>>> |/gateway/mgt/ambari/api/v1/clusters/ljm/requests?to=end&pag >>>>>> e_size=10&fields=Requests&_=1507645160383|success|Response status: >>>>>> 200 >>>>>> >>>>>> All of the above are related to the same request. >>>>>> The "access" entry prior to the "authentication" entry contains the >>>>>> URI being request with the topology in it. >>>>>> >>>>>> Does this not meet the needs? >>>>>> >>>>>> thanks, >>>>>> >>>>>> --larry >>>>>> >>>>>> On Tue, Oct 10, 2017 at 9:59 AM, Kevin Risden <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Background >>>>>>> ---------------- >>>>>>> Currently we have multiple topologies in a single Knox server. Each >>>>>>> topology points to a different Hadoop environment. For this example, >>>>>>> assume >>>>>>> the topologies are named DEV, TEST, and PROD. >>>>>>> >>>>>>> We had a group who hits Knox forget to change their LDAP password so >>>>>>> there were a bunch of messages like this in the audit logs: >>>>>>> >>>>>>> 17/09/12 15:05:08 ||GUID|audit|WEBHBASE||||authe >>>>>>>> ntication|principal|USERNAME|failure|LDAP authentication failed. >>>>>>> >>>>>>> >>>>>>> We contacted the group and they wanted to know which topology the >>>>>>> requests were hitting so they could fix their password. Regardless of if >>>>>>> they should have different users per environment or not, we had no way >>>>>>> to >>>>>>> easily tell the group which topology they were hitting. The LDAP >>>>>>> authentication failure log didn't say which topology it was hitting. >>>>>>> >>>>>>> It would be great if the audit log message was something like this: >>>>>>> >>>>>>> >>>>>>>> 17/09/12 15:05:08 ||GUID|audit|DEV|WEBHBASE||||a >>>>>>>> uthentication|principal|USERNAME|failure|LDAP authentication >>>>>>>> failed. >>>>>>>> >>>>>>> >>>>>>> In this case, the topology was added in the audit line maybe near >>>>>>> the service name. We think having the topology name on the line >>>>>>> somewhere >>>>>>> would be useful for debugging purposes. >>>>>>> >>>>>>> Question >>>>>>> ------------ >>>>>>> Is it possible to configure Knox to log which topology each line in >>>>>>> the audit log came from? >>>>>>> >>>>>>> I was looking at https://github.com/apache/k >>>>>>> nox/blob/master/gateway-util-common/src/main/java/org/apache >>>>>>> /hadoop/gateway/audit/log4j/layout/AuditLayout.java and I'm not >>>>>>> sure if its possible to easily add the topology there or if it is even >>>>>>> the >>>>>>> right place? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
