Hi Sandeep,
On analysing the debug logs further, I see that Websocket connection is not
being established properly. Websocket connection from Service Client UI to Knox
is being established properly but from Knox to Service backend is failing with
UpgradeException. Not sure why. One thing I noticed is that upgrade request
from Knox to Service Backend for websocket connection HTTP Destination is
coming in logs as ‘ws://rafint001-mgt-01.cloud.in.guavus.com:9443’ (only till
hostname and port)
2019-08-11 04:43:33,208 DEBUG client.WebSocketClient
(WebSocketClient.java:connect(372)) - connect websocket
org.eclipse.jetty.websocket.jsr356.endpoints.EndpointInstance@3cd5ba15 to
ws://rafint001-mgt-01.cloud.in.guavus.com:9443/
2019-08-11 04:43:33,221 DEBUG component.ContainerLifeCycle
(ContainerLifeCycle.java:addBean(347)) - HttpClient@aa3cf89{STARTED} added
{HttpDestination[ws://rafint001-mgt-01.cloud.in.guavus.com:9443]@33202ded,queue=0,pool=null,MANAGED}
2019-08-11 04:43:33,221 DEBUG component.AbstractLifeCycle
(AbstractLifeCycle.java:setStarting(185)) - starting
HttpDestination[ws://rafint001-mgt-01.cloud.in.guavus.com:9443]@33202ded,queue=0,pool=null
whereas from UI to Knox this comes as
‘https://rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket’.
2019-08-11 04:43:33,060 DEBUG server.HttpChannel
(HttpChannel.java:onRequest(691)) - REQUEST for
//rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket
on
HttpChannelOverHttp@7440214c{r=1,c=false,a=IDLE,uri=//rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket,age=0}
GET
//rafint001-mgt-01.cloud.in.guavus.com:8443/gateway/default/pdie/_sock/443/qu5llmni/websocket
HTTP/1.1
Host: rafint001-mgt-01.cloud.in.guavus.com:8443^M
Connection: Upgrade^M
Pragma: no-cache^M
Cache-Control: no-cache^M
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36^M
Upgrade: websocket^M
Origin: https://rafint001-mgt-01.cloud.in.guavus.com:8443^M
Sec-WebSocket-Version: 13^M
Accept-Encoding: gzip, deflate, br^M
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8^M
Cookie: KNOXSESSIONID=node0d0mr30gaibvc1y10buxlzrvkz0.node0; DEFAULT_UI=NEW^M
Sec-WebSocket-Key: A/FYAfIjeAwb4QqqO2pepg==^M
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Does this look right ? If not, please provide any pointers on how to debug this
further.
One more thing. We had increased gateway.httpclient.connectionTimeout and
gateway.httpclient.socketTimeout to 300 seconds as we were initially seeing
some request failures while integrating with Knox. Can this lead to threads
getting stuck ? I see that there are lot of TimeoutExceptions as well in the
logs.
Regards,
Rajat
From: Rajat Goel <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Monday, 12 August 2019 at 11:21 PM
To: "[email protected]" <[email protected]>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Attaching full debug log as well.
From: Rajat Goel <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Monday, 12 August 2019 at 10:50 PM
To: "[email protected]" <[email protected]>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Hi Sandeep,
Thanks for your reply.
Yes, the websocket traffic works as expected initially. Connection is
established and data flows correctly. Websocket backend is secured as well. To
check if the issue is due to SSL or not, I disabled SSL on My service and the
issue got reproduced with no secure (non SSL) setup as well.
Attaching instance logs from one cycle of reproduction. This instance uses
SSOCookieProvider (for SSO) and had DEBUG logs enabled only for
org.apache.knox.gateway package. Full debug logs file instance is large in size
so cannot send via mail but let me know if you need that as well. Also
attaching thread dump of Knox for one particular instance when Knox threads
were stuck.
I don’t see any errors in service log files. The issue is reproducible every
time so if you need any other information, please do let me know.
Thanks & Regards,
Rajat
From: Sandeep Moré <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Monday, 12 August 2019 at 7:37 PM
To: "[email protected]" <[email protected]>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Hello Rajat,
Do you see Websocket traffic work as expected the first time when the UI works
? can you check using the developer console if the initial Websocket connection
was established and data is flowing correctly?
Looks like you are connecting to a secure backend, is the Websocket backend
secure as well ?
Can you also post a redacted gateway.log file with DEBUG logging, it would be
helpful to see the entire log file with the stacktrace.
Also, do you see any errors in your service log files ?
Best,
Sandeep
On Mon, Aug 12, 2019 at 1:08 AM Rajat Goel
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I need some help while trying to integrate my custom service with Knox. Here’s
how the service is deployed:
_ Knox (Instance 1) _
_ My Service1 (Master).
|
| |
UI Client -> Load Balancer --> | | --
For Availability --> |
(HA Proxy) |
| (HA Proxy) |
| _ Knox (Instance 2) _
| | _ My Service2 (Standby)
My custom service UI client uses both web sockets as well as REST calls to talk
to My Service backend. I have written service definition xml and rewrite xml
files and deployed them on a test setup as per the above architecture. My Knox
version is 1.0 (HDP 3.1). I have added the service in Knox default topology and
default topology uses Shiro Provider for authentication and
‘gateway.websocket.feature.enabled’ is set to true.
When Knox service is started, I see the following behavior:
* When I launch the UI from browser, I get Shiro authentication dialog
popup, I enter my credentials and get redirected to my Service UI. Initially,
for the first couple of minutes, UI works fine. All calls from UI are getting
routed properly to backend.
* After first few minutes, I start seeing following exception in Knox
gateway logs:
2019-08-10 15:37:32,053 ERROR gateway.websockets
(ProxyWebSocketAdapter.java:onWebSocketConnect(105)) - Unable to connect to
websocket server: java.io.IOException: Connect failure
java.io.IOException: Connect failure
Caused by: org.eclipse.jetty.websocket.api.UpgradeException: 0 null
at
org.eclipse.jetty.websocket.client.WebSocketUpgradeRequest.onComplete(WebSocketUpgradeRequest.java:515)
Caused by: java.io.EOFException:
HttpConnectionOverHTTP@1e8a6128::SocketChannelEndPoint@5526bc43{rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443<http://rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443><->/192.168.141.31:49860<http://192.168.141.31:49860>,ISHUT,fill=-,flush=-,to=3/0}{io=0/0,kio=0,kro=1}->HttpConnectionOverHTTP@1e8a6128(l:/192.168.141.31:49860<http://192.168.141.31:49860>
<->
r:rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443,closed=false)=<http://rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443,closed=false)=>>HttpChannelOverHTTP@500cd5d(exchange=HttpExchange@75bf9768
req=TERMINATED/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@6678aae8(req=QUEUED,snd=COMPLETED,failure=null)[HttpGenerator@6dcd6c0{s=START}],recv=HttpReceiverOverHTTP@5358963(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0
of -1}]]
... 13 more
* UI starts slowing down and response times increase.
* I also see DEBUG exceptions such as:
DEBUG io.FillInterest (FillInterest.java:onFail(134)) - onFail
FillInterest@1d803962{null}
java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
DEBUG io.WriteFlusher (WriteFlusher.java:onFail(471)) - ignored:
WriteFlusher@3a2f4fc0{IDLE}->null
java.nio.channels.ClosedChannelException
at org.eclipse.jetty.io.WriteFlusher.onClose(WriteFlusher.java:502)
at
org.eclipse.jetty.io.AbstractEndPoint.onClose(AbstractEndPoint.java:353)
at
org.eclipse.jetty.io.ChannelEndPoint.onClose(ChannelEndPoint.java:216)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:225)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1132)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
DEBUG ssl.SslConnection (SslConnection.java:flush(950)) -
SslConnection@1cb9d1f9::SocketChannelEndPoint@6fa560e8{/192.168.141.31:44258<http://192.168.141.31:44258><->/192.168.141.31:8443<http://192.168.141.31:8443>,ISHUT,fill=-,flush=-,to=8/300000}{io=0/0,kio=0,kro=1}->SslConnection@1cb9d1f9{NEED_UNWRAP,eio=-1/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@205f9957{/192.168.141.31:44258<http://192.168.141.31:44258><->/192.168.141.31:8443<http://192.168.141.31:8443>,CLOSED,fill=-,flush=-,to=9/300000}=>HttpConnection@11b5199c[p=HttpParser{s=CLOSED,0
of
-1},g=HttpGenerator@40706618{s=START}]=>HttpChannelOverHttp@655d0b98{r=0,c=false,a=IDLE,uri=null,age=0}
java.io.IOException: Broken pipe
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:847)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doShutdownOutput(SslConnection.java:1076)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1131)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at
org.eclipse.jetty.io.AbstractConnection.close(AbstractConnection.java:248)
at
org.eclipse.jetty.server.HttpChannelOverHttp.earlyEOF(HttpChannelOverHttp.java:234)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1551)
at
org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:360)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:250)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:411)
* Few minutes later, UI hangs. I start seeing following Exceptions in Knox
gateway log:
2019-08-10 15:47:41,366 WARN knox.gateway
(DefaultDispatch.java:executeOutboundRequest(147)) - Connection exception
dispatching request:
https://rafint001-mgt-01.cloud.in.guavus.com:9443/_sock/iframe.html
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool
at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:313)
at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
2019-08-10 15:47:41,371 ERROR knox.gateway
(AbstractGatewayFilter.java:doFilter(63)) - Failed to execute filter:
java.io.IOException: Service connectivity error.
java.io.IOException: Service connectivity error.
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:148)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.doGet(DefaultDispatch.java:278)
* Few minutes later, when I refresh UI page (after clearing any cookies) I
see HTTP 401 Error on UI. This time no Shiro authentication dialog popup.
Nothing seems to work hereafter. I see following logs only in gateway.log:
2019-08-10 15:40:10,667 DEBUG knox.gateway
(GatewayFilter.java:doFilter(119)) - Received request: GET /pdie/cdap/
2019-08-10 15:40:10,667 DEBUG authc.BasicHttpAuthenticationFilter
(BasicHttpAuthenticationFilter.java:sendChallenge(274)) - Authentication
required: sending 401 Authentication challenge response.
* When I restart Knox service from Ambari or run a ‘touch’ command on
topology file (/etc/knox/conf/topologies/default.xml), UI starts working again
and the above set of issues repeats.
* I tried increasing gateway threads (gateway.threadpool.max to 500) and
HTTP connections (gateway.httpclient.maxConnections to 100). With these
settings, UI works fine for a little more time as compared to scenario with
default values but eventually hit the same issues as above.
* When UI hangs, I tried to thread dump of Knox service. I see that many of
the threads in Knox are stuck in socket read calls with following trace:
"qtp2099051403-139" #139 prio=5 os_prio=0
tid=0x00007f4d7c002000 nid=0x6ed8 runnable [0x00007f4dee0e0000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
- locked <0x00000006f471e218> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000006f471edc0> (a sun.security.ssl.AppInputStream)
at
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:130)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.doPost(DefaultDispatch.java:305)
at
org.apache.knox.gateway.dispatch.GatewayDispatchFilter$PostAdapter.doMethod(GatewayDispatchFilter.java:177)
at
org.apache.knox.gateway.dispatch.GatewayDispatchFilter.doFilter(GatewayDispatchFilter.java:122)
at
org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:61)
at
org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:372)
at
org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:272)
Can some please help on resolving the above set of issues:
1. With the deployment architecture as described above: UI -> HAProxy ->
Knox -> HAProxy -> Service backend, are there any specific configurations
required in HA Proxy and Knox to make this work. Is this right way to deploy ?
2. ‘org.eclipse.jetty.websocket.api.UpgradeException’ and how can this be
fixed?
3. How to debug and fix Knox Jetty threads hang ?
Thanks & Regards,
Rajat