Hi Sandeep,
Thanks for the reply. I see that this exception coming for connections from UI
Client to Knox Gateway:
2019-08-11 04:43:27,432 DEBUG ssl.SslConnection
(SslConnection.java:handshakeFailed(764)) - handshake failed
SslConnection@1d3697f3::SocketChannelEndPoint@6d000fcf{/192.168.141.33:35102<->/192.168.141.31:8443,OPEN,fill=-,flush=-,to=2/300000}{io=0/0,kio=0,kro=1}->SslConnection@1d3697f3{NEED_UNWRAP,eio=0/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@20e999e9{/192.168.141.33:35102<->/192.168.141.31:8443,OPEN,fill=-,flush=-,to=99/300000}=>HttpConnection@264e99b9[p=HttpParser{s=START,0
of
-1},g=HttpGenerator@717c736d{s=START}]=>HttpChannelOverHttp@570d89da{r=0,c=false,a=IDLE,uri=null,age=0}
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
Now to fix this, do I have to add Knox certificate in my UI Client machine i.e.
my laptop’s cacert or what else ?
One more query: For any SSL enabled service to be integrated with Knox, the
same step i.e. adding Knox certificate in UI Client’s truststore is mandatory ?
Regards,
Rajat
From: Sandeep Moré <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Wednesday, 14 August 2019 at 7:36 PM
To: "[email protected]" <[email protected]>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Hello Rajat,
I see "avax.net.ssl.SSLException: Received fatal alert: certificate_unknown"
errors in the logs, this appears to be the root cause.
Appears to be a certificate issue.
On Mon, Aug 12, 2019 at 1:51 PM Rajat Goel
<[email protected]<mailto:[email protected]>> wrote:
Attaching full debug log as well.
From: Rajat Goel <[email protected]<mailto:[email protected]>>
Reply to: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Monday, 12 August 2019 at 10:50 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Hi Sandeep,
Thanks for your reply.
Yes, the websocket traffic works as expected initially. Connection is
established and data flows correctly. Websocket backend is secured as well. To
check if the issue is due to SSL or not, I disabled SSL on My service and the
issue got reproduced with no secure (non SSL) setup as well.
Attaching instance logs from one cycle of reproduction. This instance uses
SSOCookieProvider (for SSO) and had DEBUG logs enabled only for
org.apache.knox.gateway package. Full debug logs file instance is large in size
so cannot send via mail but let me know if you need that as well. Also
attaching thread dump of Knox for one particular instance when Knox threads
were stuck.
I don’t see any errors in service log files. The issue is reproducible every
time so if you need any other information, please do let me know.
Thanks & Regards,
Rajat
From: Sandeep Moré <[email protected]<mailto:[email protected]>>
Reply to: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Monday, 12 August 2019 at 7:37 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Knox jetty threads stuck and Seeing ConnectionPoolTimeoutException
while proxying a custom service
Hello Rajat,
Do you see Websocket traffic work as expected the first time when the UI works
? can you check using the developer console if the initial Websocket connection
was established and data is flowing correctly?
Looks like you are connecting to a secure backend, is the Websocket backend
secure as well ?
Can you also post a redacted gateway.log file with DEBUG logging, it would be
helpful to see the entire log file with the stacktrace.
Also, do you see any errors in your service log files ?
Best,
Sandeep
On Mon, Aug 12, 2019 at 1:08 AM Rajat Goel
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I need some help while trying to integrate my custom service with Knox. Here’s
how the service is deployed:
_ Knox (Instance 1) _
_ My Service1 (Master).
|
| |
UI Client -> Load Balancer --> | | --
For Availability --> |
(HA Proxy) |
| (HA Proxy) |
| _ Knox (Instance 2) _
| | _ My Service2 (Standby)
My custom service UI client uses both web sockets as well as REST calls to talk
to My Service backend. I have written service definition xml and rewrite xml
files and deployed them on a test setup as per the above architecture. My Knox
version is 1.0 (HDP 3.1). I have added the service in Knox default topology and
default topology uses Shiro Provider for authentication and
‘gateway.websocket.feature.enabled’ is set to true.
When Knox service is started, I see the following behavior:
* When I launch the UI from browser, I get Shiro authentication dialog
popup, I enter my credentials and get redirected to my Service UI. Initially,
for the first couple of minutes, UI works fine. All calls from UI are getting
routed properly to backend.
* After first few minutes, I start seeing following exception in Knox
gateway logs:
2019-08-10 15:37:32,053 ERROR gateway.websockets
(ProxyWebSocketAdapter.java:onWebSocketConnect(105)) - Unable to connect to
websocket server: java.io.IOException: Connect failure
java.io.IOException: Connect failure
Caused by: org.eclipse.jetty.websocket.api.UpgradeException: 0 null
at
org.eclipse.jetty.websocket.client.WebSocketUpgradeRequest.onComplete(WebSocketUpgradeRequest.java:515)
Caused by: java.io.EOFException:
HttpConnectionOverHTTP@1e8a6128::SocketChannelEndPoint@5526bc43{rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443<http://rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443><->/192.168.141.31:49860<http://192.168.141.31:49860>,ISHUT,fill=-,flush=-,to=3/0}{io=0/0,kio=0,kro=1}->HttpConnectionOverHTTP@1e8a6128(l:/192.168.141.31:49860<http://192.168.141.31:49860>
<->
r:rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443,closed=false)=<http://rafint001-mgt-01.cloud.in.guavus.com/192.168.141.33:9443,closed=false)=>>HttpChannelOverHTTP@500cd5d(exchange=HttpExchange@75bf9768
req=TERMINATED/null@null
res=PENDING/null@null)[send=HttpSenderOverHTTP@6678aae8(req=QUEUED,snd=COMPLETED,failure=null)[HttpGenerator@6dcd6c0{s=START}],recv=HttpReceiverOverHTTP@5358963(rsp=IDLE,failure=null)[HttpParser{s=CLOSED,0
of -1}]]
... 13 more
* UI starts slowing down and response times increase.
* I also see DEBUG exceptions such as:
DEBUG io.FillInterest (FillInterest.java:onFail(134)) - onFail
FillInterest@1d803962{null}
java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
DEBUG io.WriteFlusher (WriteFlusher.java:onFail(471)) - ignored:
WriteFlusher@3a2f4fc0{IDLE}->null
java.nio.channels.ClosedChannelException
at org.eclipse.jetty.io.WriteFlusher.onClose(WriteFlusher.java:502)
at
org.eclipse.jetty.io.AbstractEndPoint.onClose(AbstractEndPoint.java:353)
at
org.eclipse.jetty.io.ChannelEndPoint.onClose(ChannelEndPoint.java:216)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:225)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1132)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
DEBUG ssl.SslConnection (SslConnection.java:flush(950)) -
SslConnection@1cb9d1f9::SocketChannelEndPoint@6fa560e8{/192.168.141.31:44258<http://192.168.141.31:44258><->/192.168.141.31:8443<http://192.168.141.31:8443>,ISHUT,fill=-,flush=-,to=8/300000}{io=0/0,kio=0,kro=1}->SslConnection@1cb9d1f9{NEED_UNWRAP,eio=-1/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@205f9957{/192.168.141.31:44258<http://192.168.141.31:44258><->/192.168.141.31:8443<http://192.168.141.31:8443>,CLOSED,fill=-,flush=-,to=9/300000}=>HttpConnection@11b5199c[p=HttpParser{s=CLOSED,0
of
-1},g=HttpGenerator@40706618{s=START}]=>HttpChannelOverHttp@655d0b98{r=0,c=false,a=IDLE,uri=null,age=0}
java.io.IOException: Broken pipe
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:847)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doShutdownOutput(SslConnection.java:1076)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.doClose(SslConnection.java:1131)
at
org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:220)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at
org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at
org.eclipse.jetty.io.AbstractConnection.close(AbstractConnection.java:248)
at
org.eclipse.jetty.server.HttpChannelOverHttp.earlyEOF(HttpChannelOverHttp.java:234)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1551)
at
org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:360)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:250)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:411)
* Few minutes later, UI hangs. I start seeing following Exceptions in Knox
gateway log:
2019-08-10 15:47:41,366 WARN knox.gateway
(DefaultDispatch.java:executeOutboundRequest(147)) - Connection exception
dispatching request:
https://rafint001-mgt-01.cloud.in.guavus.com:9443/_sock/iframe.html
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool
at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:313)
at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
2019-08-10 15:47:41,371 ERROR knox.gateway
(AbstractGatewayFilter.java:doFilter(63)) - Failed to execute filter:
java.io.IOException: Service connectivity error.
java.io.IOException: Service connectivity error.
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:148)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.doGet(DefaultDispatch.java:278)
* Few minutes later, when I refresh UI page (after clearing any cookies) I
see HTTP 401 Error on UI. This time no Shiro authentication dialog popup.
Nothing seems to work hereafter. I see following logs only in gateway.log:
2019-08-10 15:40:10,667 DEBUG knox.gateway
(GatewayFilter.java:doFilter(119)) - Received request: GET /pdie/cdap/
2019-08-10 15:40:10,667 DEBUG authc.BasicHttpAuthenticationFilter
(BasicHttpAuthenticationFilter.java:sendChallenge(274)) - Authentication
required: sending 401 Authentication challenge response.
* When I restart Knox service from Ambari or run a ‘touch’ command on
topology file (/etc/knox/conf/topologies/default.xml), UI starts working again
and the above set of issues repeats.
* I tried increasing gateway threads (gateway.threadpool.max to 500) and
HTTP connections (gateway.httpclient.maxConnections to 100). With these
settings, UI works fine for a little more time as compared to scenario with
default values but eventually hit the same issues as above.
* When UI hangs, I tried to thread dump of Knox service. I see that many of
the threads in Knox are stuck in socket read calls with following trace:
"qtp2099051403-139" #139 prio=5 os_prio=0
tid=0x00007f4d7c002000 nid=0x6ed8 runnable [0x00007f4dee0e0000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
- locked <0x00000006f471e218> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000006f471edc0> (a sun.security.ssl.AppInputStream)
at
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeOutboundRequest(DefaultDispatch.java:130)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.executeRequest(DefaultDispatch.java:116)
at
org.apache.knox.gateway.dispatch.DefaultDispatch.doPost(DefaultDispatch.java:305)
at
org.apache.knox.gateway.dispatch.GatewayDispatchFilter$PostAdapter.doMethod(GatewayDispatchFilter.java:177)
at
org.apache.knox.gateway.dispatch.GatewayDispatchFilter.doFilter(GatewayDispatchFilter.java:122)
at
org.apache.knox.gateway.filter.AbstractGatewayFilter.doFilter(AbstractGatewayFilter.java:61)
at
org.apache.knox.gateway.GatewayFilter$Holder.doFilter(GatewayFilter.java:372)
at
org.apache.knox.gateway.GatewayFilter$Chain.doFilter(GatewayFilter.java:272)
Can some please help on resolving the above set of issues:
1. With the deployment architecture as described above: UI -> HAProxy ->
Knox -> HAProxy -> Service backend, are there any specific configurations
required in HA Proxy and Knox to make this work. Is this right way to deploy ?
2. ‘org.eclipse.jetty.websocket.api.UpgradeException’ and how can this be
fixed?
3. How to debug and fix Knox Jetty threads hang ?
Thanks & Regards,
Rajat