Hi Hanbing Luo,

thanks for the clarification about the proxy!

A pull request is very welcome. It makes reviewing the change and testing it much easier. Thanks in advance!

Best,
Sebastian

On 1/20/25 15:12, dolphin wrote:
Hi Sebastian


Thank you for your response.


I apologize for not mentioning earlier that I was using a proxy instead of making a 
direct request. Specifically, I have a ClashVerge 
client(https://github.com/clash-verge-rev/clash-verge-rev) running on localhost 
with the port set to 7890 because I encountered a 403 response page due to 
Cloudflare Bot Protection. 


I did some investigation and noticed that the SSLHandshakeException appears to 
be related to the code in the HttpResponse class, particularly between lines 
121 and 136. Based on my debugging, the SSL handshake is being performed with 
localhost:7890 rather than weworkremotely.com.


At this point, I'm unsure if this issue is a bug in the codebase or something 
specific to the way ClashVerge handles connections locally. However, when I use 
the following code, everything works as expected.


Please let me know if you need further details or if you'd like me to submit a 
pull request to address this. I'm happy to contribute if needed.

Thanks


Codesnippet:
    try {
//      socket = new Socket(); // create the socket
//      socket.setSoTimeout(http.getTimeout());
      boolean useProxy = http.useProxy(url);
      if (useProxy) {
        Proxy proxy = new Proxy(Proxy.Type.HTTP,
            new 
InetSocketAddress(http.getProxyHost(), http.getProxyPort()));
        socket = new Socket(proxy);
      } else {
        socket = new Socket();
      }
      socket.setSoTimeout(http.getTimeout());


      // connect
//      String sockHost = http.useProxy(url) ? 
http.getProxyHost() : host;
      int sockPort = http.useProxy(url) ? http.getProxyPort() : 
port;
//      InetSocketAddress sockAddr = new 
InetSocketAddress(sockHost, sockPort);
      InetSocketAddress sockAddr = new InetSocketAddress(host, 
port);
      socket.connect(sockAddr, http.getTimeout());


      if (scheme == Scheme.HTTPS) {
        SSLSocket sslsocket = null;


        try {
//          sslsocket = getSSLSocket(socket, sockHost, 
sockPort);
          sslsocket = getSSLSocket(socket, host, port);
          sslsocket.startHandshake();



------------------ 原始邮件 ------------------
发件人:                                                                                             
                           "user"                                                      
                              <[email protected]&gt;;
发送时间:&nbsp;2025年1月18日(星期六) 上午6:28
收件人:&nbsp;"user"<[email protected]&gt;;

主题:&nbsp;Re: Issue with SSLHandshakeException in v1.20 using protocol-http 
plugin



Hi Hanbing Luo,

thanks for reporting this issue.

Difficult to tell whether this is a bug or not.

Unfortunately, I'm not able to reproduce it using 1.20 or the recent Nutch
master and Java 11 on Linux. The mentioned page was successfully fetched.

But if you are able to reproduce the issue repeatedly, it is one. Definitely.

Generally speaking, protocol-okhttp is the more advanced one,
supporting HTTP/2 and pooling connections.

Although for HTTP/1.1, protocol-http should do its job.

Best,
Sebastian


On 1/17/25 10:29, dolphin wrote:
&gt; Hello everyone
&gt;
&gt;
&gt; I used Nutch (v1.20) to crawl the website:
&gt; https://weworkremotely.com/
&gt; With the default&amp;nbsp;protocol-http&amp;nbsp;plugin, I encountered an 
SSLHandshakeException. However, this issue does not occur when using 
the&amp;nbsp;protocol-okhttp&amp;nbsp;plugin.
&gt;
&gt; The exception appears to be related to the TLS version. The web server 
does not support TLSv1 but does support TLSv1.2 and TLSv1.3. I also tried setting 
-Dhttps.protocols=TLSv1.3, but it didn't resolve the issue.
&gt;
&gt; I'm unsure if this is a bug in the protocol-http plugin. Please let me 
know if further details are needed.
&gt;
&gt;
&gt; Thanks
&gt; Hanbing Luo
&gt;
&gt;
&gt; How to reproduce:
&gt; run org.apache.nutch.protocol.http.Http main method with above URL.
&gt;
&gt;
&gt; java -version
&gt; java version "11.0.23" 2024-04-16 LTS
&gt; Java(TM) SE Runtime Environment 18.9 (build 11.0.23+7-LTS-222)
&gt; Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.23+7-LTS-222, mixed 
mode)
&gt;
&gt;
&gt; Detailed error stack:
&gt; 2025-01-10 15:12:32,399 ERROR o.a.n.p.h.Http [main] Failed to get protocol 
output
&gt; org.apache.nutch.protocol.http.api.HttpException: SSL connect 
to&amp;nbsp;https://weworkremotely.com&amp;nbsp;failed with: Remote host terminated 
the handshake
&gt; at 
org.apache.nutch.protocol.http.HttpResponse.<init&amp;gt;(HttpResponse.java:156) 
~[classes/:?]
&gt; at org.apache.nutch.protocol.http.Http.getResponse(Http.java:65) 
~[classes/:?]
&gt; at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:354) 
[classes/:?]
&gt; at org.apache.nutch.protocol.http.api.HttpBase.main(HttpBase.java:697) 
[classes/:?]
&gt; at org.apache.nutch.protocol.http.Http.main(Http.java:59) [classes/:?]
&gt; Caused by: javax.net.ssl.SSLHandshakeException:&amp;nbsp;Remote host 
terminated the handshake
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1715) ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1514) ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421)
 ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) 
~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) 
~[?:?]
&gt; at 
org.apache.nutch.protocol.http.HttpResponse.<init&amp;gt;(HttpResponse.java:136) 
~[classes/:?]
&gt; ... 4 more
&gt; Caused by: java.io.EOFException:&amp;nbsp;SSL peer shut down incorrectly
&gt; at 
java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:489) 
~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478)
 ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160)
 ~[?:?]
&gt; at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111) 
~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506) ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1421)
 ~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) 
~[?:?]
&gt; at 
java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) 
~[?:?]
&gt; at 
org.apache.nutch.protocol.http.HttpResponse.<init&amp;gt;(HttpResponse.java:136) 
~[classes/:?]
&gt; ... 4 more
&gt; Status: exception(16), lastModified=0: 
org.apache.nutch.protocol.http.api.HttpException: SSL connect 
to&amp;nbsp;https://weworkremotely.com&amp;nbsp;failed with: Remote host terminated 
the handshake

Reply via email to