Hi Hanbing Luo, thanks for the clarification about the proxy!
A pull request is very welcome. It makes reviewing the change and testing it much easier. Thanks in advance!
Best, Sebastian On 1/20/25 15:12, dolphin wrote:
Hi Sebastian Thank you for your response. I apologize for not mentioning earlier that I was using a proxy instead of making a direct request. Specifically, I have a ClashVerge client(https://github.com/clash-verge-rev/clash-verge-rev) running on localhost with the port set to 7890 because I encountered a 403 response page due to Cloudflare Bot Protection. I did some investigation and noticed that the SSLHandshakeException appears to be related to the code in the HttpResponse class, particularly between lines 121 and 136. Based on my debugging, the SSL handshake is being performed with localhost:7890 rather than weworkremotely.com. At this point, I'm unsure if this issue is a bug in the codebase or something specific to the way ClashVerge handles connections locally. However, when I use the following code, everything works as expected. Please let me know if you need further details or if you'd like me to submit a pull request to address this. I'm happy to contribute if needed. Thanks Codesnippet: try { // socket = new Socket(); // create the socket // socket.setSoTimeout(http.getTimeout()); boolean useProxy = http.useProxy(url); if (useProxy) { Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(http.getProxyHost(), http.getProxyPort())); socket = new Socket(proxy); } else { socket = new Socket(); } socket.setSoTimeout(http.getTimeout()); // connect // String sockHost = http.useProxy(url) ? http.getProxyHost() : host; int sockPort = http.useProxy(url) ? http.getProxyPort() : port; // InetSocketAddress sockAddr = new InetSocketAddress(sockHost, sockPort); InetSocketAddress sockAddr = new InetSocketAddress(host, port); socket.connect(sockAddr, http.getTimeout()); if (scheme == Scheme.HTTPS) { SSLSocket sslsocket = null; try { // sslsocket = getSSLSocket(socket, sockHost, sockPort); sslsocket = getSSLSocket(socket, host, port); sslsocket.startHandshake(); ------------------ 原始邮件 ------------------ 发件人: "user" <[email protected]>; 发送时间: 2025年1月18日(星期六) 上午6:28 收件人: "user"<[email protected]>; 主题: Re: Issue with SSLHandshakeException in v1.20 using protocol-http plugin Hi Hanbing Luo, thanks for reporting this issue. Difficult to tell whether this is a bug or not. Unfortunately, I'm not able to reproduce it using 1.20 or the recent Nutch master and Java 11 on Linux. The mentioned page was successfully fetched. But if you are able to reproduce the issue repeatedly, it is one. Definitely. Generally speaking, protocol-okhttp is the more advanced one, supporting HTTP/2 and pooling connections. Although for HTTP/1.1, protocol-http should do its job. Best, Sebastian On 1/17/25 10:29, dolphin wrote: > Hello everyone > > > I used Nutch (v1.20) to crawl the website: > https://weworkremotely.com/ > With the default&nbsp;protocol-http&nbsp;plugin, I encountered an SSLHandshakeException. However, this issue does not occur when using the&nbsp;protocol-okhttp&nbsp;plugin. > > The exception appears to be related to the TLS version. The web server does not support TLSv1 but does support TLSv1.2 and TLSv1.3. I also tried setting -Dhttps.protocols=TLSv1.3, but it didn't resolve the issue. > > I'm unsure if this is a bug in the protocol-http plugin. Please let me know if further details are needed. > > > Thanks > Hanbing Luo > > > How to reproduce: > run org.apache.nutch.protocol.http.Http main method with above URL. > > > java -version > java version "11.0.23" 2024-04-16 LTS > Java(TM) SE Runtime Environment 18.9 (build 11.0.23+7-LTS-222) > Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.23+7-LTS-222, mixed mode) > > > Detailed error stack: > 2025-01-10 15:12:32,399 ERROR o.a.n.p.h.Http [main] Failed to get protocol output > org.apache.nutch.protocol.http.api.HttpException: SSL connect to&nbsp;https://weworkremotely.com&nbsp;failed with: Remote host terminated the handshake > at org.apache.nutch.protocol.http.HttpResponse.<init&gt;(HttpResponse.java:156) ~[classes/:?] > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:65) ~[classes/:?] > at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:354) [classes/:?] > at org.apache.nutch.protocol.http.api.HttpBase.main(HttpBase.java:697) [classes/:?] > at org.apache.nutch.protocol.http.Http.main(Http.java:59) [classes/:?] > Caused by: javax.net.ssl.SSLHandshakeException:&nbsp;Remote host terminated the handshake > at java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1715) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1514) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) ~[?:?] > at org.apache.nutch.protocol.http.HttpResponse.<init&gt;(HttpResponse.java:136) ~[classes/:?] > ... 4 more > Caused by: java.io.EOFException:&nbsp;SSL peer shut down incorrectly > at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489) ~[?:?] > at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478) ~[?:?] > at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160) ~[?:?] > at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) ~[?:?] > at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) ~[?:?] > at org.apache.nutch.protocol.http.HttpResponse.<init&gt;(HttpResponse.java:136) ~[classes/:?] > ... 4 more > Status: exception(16), lastModified=0: org.apache.nutch.protocol.http.api.HttpException: SSL connect to&nbsp;https://weworkremotely.com&nbsp;failed with: Remote host terminated the handshake

