Hello Rajat, I agree with you, knox should receive a "Connection: close" header from the backend to terminate the connection. Something like this - "DEBUG http.wire (Wire.java:wire(73)) - http-outgoing-35 << "Connection: close[\r][\n]""
You initially reported issues with Websockets hanging as well, do you see the same there ? or it's just http connections. Thanks, Sandeep On Mon, Aug 19, 2019 at 1:45 PM Rajat Goel <[email protected]> wrote: > Hi Sandeep, > > > > I tried removing all load balancers, HA etc so now there is just UI -> > Knox -> Backend, but the same issue is seen again. > > > > On debugging the issue further, I found following is happening: > > 1. UI client sends HTTPS request to Knox gateway (Eg. GET > /_sock/493/tet0wey4/htmlfile?c=_jp.ab0qmfk HTTP/1.1) > 2. Knox proxies the request to UI Backend server which is a Node js > server > 3. Backend Node js server returns the response with header > ‘Transfer-Encoding: chunked’. Node js server uses SockJs library for > sockets connections. I looked at logs of HTTP wire message at Knox, which > is received as response from Node js server, following is the content: > > > > *2019-08-17 10:34:42,699 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "Transfer-Encoding: chunked[\r][\n]"* > > *2019-08-17 10:34:42,699 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:34:42,700 DEBUG http.wire (Wire.java:wire(87)) - > http-outgoing-15 << "40e"* > > *2019-08-17 10:34:42,700 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(122)) - > http-outgoing-15 << HTTP/1.1 200 OK* > > *2019-08-17 10:34:42,701 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(125)) - > http-outgoing-15 << Cache-Control: no-store, no-cache, no-transform, > must-revalidate, max-age=0* > > *2019-08-17 10:34:42,701 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(125)) - > http-outgoing-15 << Content-Type: text/html; charset=UTF-8* > > *2019-08-17 10:34:42,701 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(125)) - > http-outgoing-15 << Date: Sat, 17 Aug 2019 10:34:42 GMT* > > *2019-08-17 10:34:42,702 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(125)) - > http-outgoing-15 << Connection: keep-alive* > > *2019-08-17 10:34:42,702 DEBUG http.headers > (LoggingManagedHttpClientConnection.java:onResponseReceived(125)) - > http-outgoing-15 << **Transfer-Encoding: chunked* > > *2019-08-17 10:34:42,706 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:34:42,706 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "<!doctype html>[\n]"* > > *2019-08-17 10:34:42,707 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "<html><head>[\n]"* > > *2019-08-17 10:34:42,707 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " <meta http-equiv="X-UA-Compatible" content="IE=edge" > />[\n]"* > > *2019-08-17 10:34:42,707 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " <meta http-equiv="Content-Type" content="text/html; > charset=UTF-8" />[\n]"* > > *2019-08-17 10:34:42,707 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "</head><body><h2>Don't panic!</h2>[\n]"* > > *2019-08-17 10:34:42,708 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " <script>[\n]"* > > *2019-08-17 10:34:42,708 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " document.domain = document.domain;[\n]"* > > *2019-08-17 10:34:42,708 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " var c = parent._jp.ab0qmfk;[\n]"* > > *2019-08-17 10:34:42,708 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " c.start();[\n]"* > > *2019-08-17 10:34:42,709 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " function p(d) {c.message(d);};[\n]"* > > *2019-08-17 10:34:42,709 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " window.onload = function() {c.stop();};[\n]"* > > *2019-08-17 10:34:42,709 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << " </script>* > > * > [\r][\n]"* > > *2019-08-17 10:34:42,710 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:34:42,713 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:34:42,714 DEBUG http.wire (Wire.java:wire(87)) - > http-outgoing-15 << "1c"* > > *2019-08-17 10:34:42,714 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:34:42,715 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "<script>[\n]"* > > *2019-08-17 10:34:42,715 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "p("o");[\n]"* > > *2019-08-17 10:34:42,715 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "</script>[\r][\n]"* > > *2019-08-17 10:34:42,716 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:35:07,696 DEBUG http.wire (Wire.java:wire(87)) - > http-outgoing-15 << "1c"* > > *2019-08-17 10:35:07,696 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *2019-08-17 10:35:07,697 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "<script>[\n]"* > > *2019-08-17 10:35:07,697 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "p("h");[\n]"* > > *2019-08-17 10:35:07,697 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "</script>[\r][\n]"* > > *2019-08-17 10:35:07,698 DEBUG http.wire (Wire.java:wire(73)) - > http-outgoing-15 << "[\r][\n]"* > > *….* > > > > The last few messages in bold is recurrent data sent with a periodic > heartbeat and is being sent by Node js backend server every 25 seconds. > Notice that no last/end chunk message is sent from backend server and Knox > server is just waiting for this last chunk message. This results in > connection hang. Similar messages are seen for rest of the connections > which are held up. Looks like the cause here is non-standard > implementation of chunked transfer encoding in SockJS library. Am I on the > right path here or you feel issue might be something else ? > > > > Looking further into the issue with SockJS library and how it can be > fixed. Thanks for the help so far. > > > > Regards, > > Rajat > > > > *From: *Sandeep Moré <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Saturday, 17 August 2019 at 8:00 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > This could be multiple sockets, this is weird, looks like for some reason > there are multiple sockets opened between Knox and the backend. > > Can you try using just Knox and the backend and see what you get, i.e. get > rid of the load balancers and HA to isolate the problem. > > > > On Sat, Aug 17, 2019 at 8:14 AM Rajat Goel <[email protected]> > wrote: > > Hi Sandeep, > > > > While reproducing the issue, I observed that ‘netstat’ command shows 32 > connections in ESTABLISHED state with UI backend service: > > *tcp6 0 0 192.168.133.69:48510 <http://192.168.133.69:48510> > 192.168.133.69:9443 <http://192.168.133.69:9443> ESTABLISHED 19122/java* > > *tcp6 0 0 192.168.133.69:53848 <http://192.168.133.69:53848> > 192.168.133.69:9443 <http://192.168.133.69:9443> ESTABLISHED 19122/java* > > *tcp6 0 0 192.168.133.69:52994 <http://192.168.133.69:52994> > 192.168.133.69:9443 <http://192.168.133.69:9443> ESTABLISHED 19122/java* > > *…* > > > > However, when I search for port number in gaeway.log file (full debugs > logs enabled), there is only log mentioning that particular port: > > *[root@rafd001-mst-01 knox]# cat gateway.log | grep -w 48510* > > *2019-08-17 10:34:40,717 DEBUG conn.DefaultHttpClientConnectionOperator > (DefaultHttpClientConnectionOperator.java:connect(146)) - Connection > established 192.168.133.69 > <http://192.168.133.69>:48510<->192.168.133.69:9443 > <http://192.168.133.69:9443>* > > > > Looks like the connection was opened but not used? Other connections in > netstat output show same trend. No logs in gateway.log file as to how the > connection is getting used and these connections stay in ESTABLISHED state > forever. They don’t even timeout. Any thoughts on why this could be > happening ? > > > > Regards, > > Rajat > > > > *From: *Rajat Goel <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Friday, 16 August 2019 at 11:18 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hi Sandeep, > > > > You are right, websocket connection are not being established properly. I > debugged this more today and looked at Knox’s GatewayWebsocketHandler.java > code. From the code, I found that Knox uses service Url protocol to create > backend ws or wss websocket connection whereas my service Url had ‘https’ > as I thought Knox would use incoming request URL (protocol as well as path) > to construct backend websocket URL. After making some changes, I was able > to fix UpgradeException issues. Just a thought there: why not use request > URI in getMatchedBackendURL() API { GatewayWebsocketHandler.java} to > generate backend URL. This way we won’t have to write separate service > definition for http(s) and ws requests, in many cases. > > > > Coming back to original issue, after fixing UpgradeException I still see > ClosedChannel Exceptions, ConnectionPool timeout, Service connectivity > errors, UI slowing down and eventually hang. > > ‘netstat’ output shows 32 connections in ESTABLISHED state between Knox > and UI backend server, all in Idle state. I had configured socket timeout > as well as wbesocket idle timeouts to 60 seconds, still connections stay in > ESTABLISHED state. Why so many connections are getting established ? When U > is accessed directly i.e. there is no Knox proxy between UI client and > backend, I only see 4-5 TCP connections between UI client and UI backend. > > > > Please check. > > > > Regards, > > Rajat > > > > *From: *Sandeep Moré <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Friday, 16 August 2019 at 10:37 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Rajat, > > I could not see data flowing through websocket in the logs. I also noticed > that your backend uses https but your websocket connection is using ws://, > shouldn't it be using wss:// ? > > > > On Fri, Aug 16, 2019 at 7:02 AM Rajat Goel <[email protected]> > wrote: > > Hi Sandeep, Knox Team, > > > > Need urgent help/pointers in debugging this issue as this is very critical > for us. So request you to please check on the same. > > > > Thanks & Regards, > > Rajat > > > > *From: *Rajat Goel <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Thursday, 15 August 2019 at 12:02 AM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hi Sandeep, > > > > I added Knox certificate as trusted certificate on my UI Client machine > and "avax.net.ssl.SSLException: Received fatal alert: certificate_unknown" > errors no longer appear. However, other issues such as > ‘org.eclipse.jetty.websocket.api.UpgradeException’, ‘java.io.IOException: > Service connectivity error’ are still coming and UI eventually slows down > and hangs. > > > > On UpgradeException, I had posted in my other mail, copying snippet here, > please check: > > > > <snip> > > > > *On analysing the debug logs further, I see that Websocket connection is > not being established properly. Websocket connection from Service Client UI > to Knox is being established properly but from Knox to Service backend is > failing with UpgradeException. Not sure why. One thing I noticed is that > upgrade request from Knox to Service Backend for websocket connection HTTP > Destination is coming in logs as > ‘ws://rafint001-mgt-01.cloud.in.guavus.com:9443’ (only till hostname and > port) * > > > > *2019-08-11 04:43:33,208 DEBUG client.WebSocketClient > (WebSocketClient.java:connect(372)) - connect websocket > org.eclipse.jetty.websocket.jsr356.endpoints.EndpointInstance@3cd5ba15 to > ws://rafint001-mgt-01.cloud.in.guavus.com:9443/ > <http://rafint001-mgt-01.cloud.in.guavus.com:9443/>* > > *2019-08-11 04:43:33,221 DEBUG component.ContainerLifeCycle > (ContainerLifeCycle.java:addBean(347)) - HttpClient@aa3cf89{STARTED} added > {HttpDestination[ws://rafint001-mgt-01.cloud.in.guavus.com:9443]@33202ded,queue=0,pool=null,MANAGED}* > > *2019-08-11 04:43:33,221 DEBUG component.AbstractLifeCycle > (AbstractLifeCycle.java:setStarting(185)) - starting > HttpDestination[ws://rafint001-mgt-01.cloud.in.guavus.com:9443]@33202ded,queue=0,pool=null* > > > > > > *whereas from UI to Knox this comes as > ‘https://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket > <https://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket>’.* > > > > *2019-08-11 04:43:33,060 DEBUG server.HttpChannel > (HttpChannel.java:onRequest(691)) - REQUEST for > //rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket > <http://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket> > on > HttpChannelOverHttp@7440214c{r=1,c=false,a=IDLE,uri=//rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket,age=0 > <http://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket,age=0>}* > > *GET > //rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket > <http://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket> > HTTP/1.1* > > *Host: rafint001-mgt-01.cloud.in.guavus.com:8443 > <http://rafint001-mgt-01.cloud.in.guavus.com:8443>^M* > > *Connection: Upgrade^M* > > *Pragma: no-cache^M* > > *Cache-Control: no-cache^M* > > *User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36^M* > > *Upgrade: websocket^M* > > *Origin: https://rafint001-mgt-01.cloud.in.guavus.com:8443^M > <https://rafint001-mgt-01.cloud.in.guavus.com:8443%5eM>* > > *Sec-WebSocket-Version: 13^M* > > *Accept-Encoding: gzip, deflate, br^M* > > *Accept-Language: en-GB,en-US;q=0.9,en;q=0.8^M* > > *Cookie: KNOXSESSIONID=node0d0mr30gaibvc1y10buxlzrvkz0.node0; > DEFAULT_UI=NEW^M* > > *Sec-WebSocket-Key: A/FYAfIjeAwb4QqqO2pepg==^M* > > *Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits* > > > > *Does this look right ? If not, please provide any pointers on how to > debug this further.* > > > > </snip> > > > > Attaching a log file from a new instance run after fixing > *certificate_unknown* error. Sorry about sending such big files. Please > check. > > > > Regards, > > Rajat > > > > *From: *Rajat Goel <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Wednesday, 14 August 2019 at 8:20 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hi Sandeep, > > > > Thanks for the reply. I see that this exception coming for connections > from UI Client to Knox Gateway: > > > > *2019-08-11 04:43:27,432 DEBUG ssl.SslConnection > (SslConnection.java:handshakeFailed(764)) - handshake failed > SslConnection@1d3697f3::SocketChannelEndPoint@6d000fcf{/192.168.141.33:35102 > <http://192.168.141.33:35102><->/192.168.141.31:8443 > <http://192.168.141.31:8443>,OPEN,fill=-,flush=-,to=2/300000}{io=0/0,kio=0,kro=1}->SslConnection@1d3697f3{NEED_UNWRAP,eio=0/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@20e999e9{/192.168.141.33:35102 > <http://192.168.141.33:35102><->/192.168.141.31:8443 > <http://192.168.141.31:8443>,OPEN,fill=-,flush=-,to=99/300000}=>HttpConnection@264e99b9[p=HttpParser{s=START,0 > of > -1},g=HttpGenerator@717c736d{s=START}]=>HttpChannelOverHttp@570d89da{r=0,c=false,a=IDLE,uri=null,age=0} > javax.net.ssl.SSLException: Received fatal alert: certificate_unknown* > > > > Now to fix this, do I have to add Knox certificate in my UI Client machine > i.e. my laptop’s cacert or what else ? > > > > One more query: For any SSL enabled service to be integrated with Knox, > the same step i.e. adding Knox certificate in UI Client’s truststore is > mandatory ? > > > > Regards, > > Rajat > > > > *From: *Sandeep Moré <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Wednesday, 14 August 2019 at 7:36 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hello Rajat, > > > > I see "avax.net.ssl.SSLException: Received fatal alert: > certificate_unknown" errors in the logs, this appears to be the root cause. > > Appears to be a certificate issue. > > > > On Mon, Aug 12, 2019 at 1:51 PM Rajat Goel <[email protected]> > wrote: > > Attaching full debug log as well. > > > > *From: *Rajat Goel <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Monday, 12 August 2019 at 10:50 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hi Sandeep, > > > > Thanks for your reply. > > > > Yes, the websocket traffic works as expected initially. Connection is > established and data flows correctly. Websocket backend is secured as well. > To check if the issue is due to SSL or not, I disabled SSL on My service > and the issue got reproduced with no secure (non SSL) setup as well. > > > > Attaching instance logs from one cycle of reproduction. This instance uses > SSOCookieProvider (for SSO) and had DEBUG logs enabled only for > org.apache.knox.gateway package. Full debug logs file instance is large in > size so cannot send via mail but let me know if you need that as well. Also > attaching thread dump of Knox for one particular instance when Knox threads > were stuck. > > > > I don’t see any errors in service log files. The issue is reproducible > every time so if you need any other information, please do let me know. > > > > Thanks & Regards, > > Rajat > > > > *From: *Sandeep Moré <[email protected]> > *Reply to: *"[email protected]" <[email protected]> > *Date: *Monday, 12 August 2019 at 7:37 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Knox jetty threads stuck and Seeing > ConnectionPoolTimeoutException while proxying a custom service > > > > Hello Rajat, > > > > Do you see Websocket traffic work as expected the first time when the UI > works ? can you check using the developer console if the initial Websocket > connection was established and data is flowing correctly? > > > > Looks like you are connecting to a secure backend, is the Websocket > backend secure as well ? > > Can you also post a redacted gateway.log file with DEBUG logging, it would > be helpful to see the entire log file with the stacktrace. > > > > Also, do you see any errors in your service log files ? > > > > Best, > > Sandeep > > > > > > On Mon, Aug 12, 2019 at 1:08 AM Rajat Goel <[email protected]> > wrote: > > Hello, > > > > I need some help while trying to integrate my custom service with Knox. > Here’s how the service is deployed: > > > > _ Knox (Instance > 1) _ _ My Service1 (Master). > > > | > | > | > > UI Client -> Load Balancer à | | > -- For Availability à | > > (HA Proxy) > | | (HA > Proxy) | > > | _ Knox (Instance > 2) _ | | _ My Service2 (Standby) > > > > > > My custom service UI client uses both web sockets as well as REST calls to > talk to My Service backend. I have written service definition xml and > rewrite xml files and deployed them on a test setup as per the above > architecture. My Knox version is 1.0 (HDP 3.1). I have added the service in > Knox default topology and default topology uses Shiro Provider for > authentication and ‘gateway.websocket.feature.enabled’ is set to true. > > > > When Knox service is started, I see the following behavior: > > - When I launch the UI from browser, I get Shiro authentication dialog > popup, I enter my credentials and get redirected to my Service UI. > Initially, for the first couple of minutes, UI works fine. All calls from > UI are getting routed properly to backend. > - After first few minutes, I start seeing following exception in Knox > gateway logs: > > 2019-08-10 15:37:32,053 ERROR gateway.websockets > (ProxyWebSocketAdapter.java:onWebSocketConnect(105)) - Unable to connect to > websocket server: java.io.IOException: Connect failure > > java.io.IOException: Connect failure > > Caused by: org.eclipse.jetty.websocket.api.UpgradeException: 0 null > > at > org.eclipse.jetty.websocket.client.WebSocketUpgradeRequest.onComplete(WebSocketUpgradeRequest.java:515) > > Caused by: java.io.EOFException: HttpConnectionOverHTTP@1e8a6128 > ::SocketChannelEndPoint@5526bc43{ > rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443<->/ > 192.168.141.31:49860 > ,ISHUT,fill=-,flush=-,to=3/0}{io=0/0,kio=0,kro=1}->HttpConnectionOverHTTP@1e8a6128 > (l:/192.168.141.31:49860 <-> r: > rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443,closed=false)= > >HttpChannelOverHTTP@500cd5d(exchange=HttpExchange@75bf9768 > req=TERMINATED/null@null res=PENDING/null@null > )[send=HttpSenderOverHTTP@6678aae8 > (req=QUEUED,snd=COMPLETED,failure=null)[HttpGenerator@6dcd6c0 > {s=START}],recv=HttpReceiverOverHTTP@5358963(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0 > of -1}]] > > ... 13 more > > > > - UI starts slowing down and response times increase. > - I also see DEBUG exceptions such as: > > DEBUG io.FillInterest (FillInterest.java:onFail(134)) - onFail > FillInterest@1d803962{null} > > java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 > ms > > at > org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166) > > at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > > > DEBUG io.WriteFlusher (WriteFlusher.java:onFail(471)) - ignored: > WriteFlusher@3a2f4fc0{IDLE}->null > > java.nio.channels.ClosedChannelException > > at org.eclipse.jetty.io.WriteFlusher.onClose(WriteFlusher.java:502) > > at > org.eclipse.jetty.io.AbstractEndPoint.onClose(AbstractEndPoint.java:353) > > at > org.eclipse.jetty.io.ChannelEndPoint.onClose(ChannelEndPoint.java:216) > > at > org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:225) > > at > org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192) > > at > org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175) > > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1132) > > at > org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220) > > > > DEBUG ssl.SslConnection (SslConnection.java:flush(950)) - > SslConnection@1cb9d1f9::SocketChannelEndPoint@6fa560e8{/ > 192.168.141.31:44258<->/192.168.141.31:8443 > ,ISHUT,fill=-,flush=-,to=8/300000}{io=0/0,kio=0,kro=1}->SslConnection@1cb9d1f9 > {NEED_UNWRAP,eio=-1/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@205f9957 > {/192.168.141.31:44258<->/192.168.141.31:8443 > ,CLOSED,fill=-,flush=-,to=9/300000}=>HttpConnection@11b5199c[p=HttpParser{s=CLOSED,0 > of -1},g=HttpGenerator@40706618{s=START}]=>HttpChannelOverHttp@655d0b98 > {r=0,c=false,a=IDLE,uri=null,age=0} > > java.io.IOException: Broken pipe > > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:847) > > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doShutdownOutput(SslConnection.java:1076) > > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1131) > > at > org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220) > > at > org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192) > > at > org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175) > > at > org.eclipse.jetty.io.AbstractConnection.close(AbstractConnection.java:248) > > at > org.eclipse.jetty.server.HttpChannelOverHttp.earlyEOF(HttpChannelOverHttp.java:234) > > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1551) > > at > org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:360) > > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:250) > > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > > at > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) > > at > org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:411) > > > > - Few minutes later, UI hangs. I start seeing following Exceptions in > Knox gateway log: > > 2019-08-10 15:47:41,366 WARN knox.gateway > (DefaultDispatch.java:executeOutboundRequest(147)) - Connection exception > dispatching request: > https://rafint001-mgt-01.cloud.in.guavus.com:9443/_sock/iframe.html > org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for > connection from pool > > at > org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:313) > > at > org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279) > > > > > > 2019-08-10 15:47:41,371 ERROR knox.gateway > (AbstractGatewayFilter.java:doFilter(63)) - Failed to execute filter: > java.io.IOException: Service connectivity error. > > java.io.IOException: Service connectivity error. > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:148) > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116) > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.doGet(DefaultDispatch.java:278) > > > > - Few minutes later, when I refresh UI page (after clearing any > cookies) I see HTTP 401 Error on UI. This time no Shiro authentication > dialog popup. Nothing seems to work hereafter. I see following logs only in > gateway.log: > > 2019-08-10 15:40:10,667 DEBUG knox.gateway > (GatewayFilter.java:doFilter(119)) - Received request: GET /pdie/cdap/ > > 2019-08-10 15:40:10,667 DEBUG authc.BasicHttpAuthenticationFilter > (BasicHttpAuthenticationFilter.java:sendChallenge(274)) - Authentication > required: sending 401 Authentication challenge response. > > > > - When I restart Knox service from Ambari or run a ‘touch’ command on > topology file (/etc/knox/conf/topologies/default.xml), UI starts working > again and the above set of issues repeats. > - I tried increasing gateway threads (gateway.threadpool.max to 500) > and HTTP connections (gateway.httpclient.maxConnections to 100). With these > settings, UI works fine for a little more time as compared to scenario with > default values but eventually hit the same issues as above. > - When UI hangs, I tried to thread dump of Knox service. I see that > many of the threads in Knox are stuck in socket read calls with following > trace: > > "qtp2099051403-139" #139 prio=5 > os_prio=0 tid=0x00007f4d7c002000 nid=0x6ed8 runnable [0x00007f4dee0e0000] > > java.lang.Thread.State: RUNNABLE > > at java.net.SocketInputStream.socketRead0(Native Method) > > at > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > > at java.net.SocketInputStream.read(SocketInputStream.java:171) > > at java.net.SocketInputStream.read(SocketInputStream.java:141) > > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) > > at sun.security.ssl.InputRecord.read(InputRecord.java:503) > > at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) > > - locked <0x00000006f471e218> (a java.lang.Object) > > at > sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) > > at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) > > - locked <0x00000006f471edc0> (a sun.security.ssl.AppInputStream) > > at > org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) > > at > org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) > > at > org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) > > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) > > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > > at > org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) > > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) > > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:130) > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116) > > at > org.apache.knox.gateway.dispatch.DefaultDispatch.doPost(DefaultDispatch.java:305) > > at > org.apache.knox.gateway.dispatch.GatewayDispatchFilter$PostAdapter.doMethod(GatewayDispatchFilter.java:177) > > at > org.apache.knox.gateway.dispatch.GatewayDispatchFilter.doFilter(GatewayDispatchFilter.java:122) > > at > org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:61) > > at > org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:372) > > at > org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:272) > > > > Can some please help on resolving the above set of issues: > > 1. With the deployment architecture as described above: UI -> HAProxy > -> Knox -> HAProxy -> Service backend, are there any specific > configurations required in HA Proxy and Knox to make this work. Is this > right way to deploy ? > 2. ‘org.eclipse.jetty.websocket.api.UpgradeException’ and how can this > be fixed? > 3. How to debug and fix Knox Jetty threads hang ? > > > > Thanks & Regards, > > Rajat > >
