#28669: Bug: ../src/feature/hs/hs_client.c:280:
retry_all_socks_conn_waiting_for_desc
--------------------------+------------------------------------
 Reporter:  traumschule   |          Owner:  (none)
     Type:  defect        |         Status:  needs_review
 Priority:  Medium        |      Milestone:  Tor: 0.4.0.x-final
Component:  Core Tor/Tor  |        Version:
 Severity:  Normal        |     Resolution:
 Keywords:  tor-hs        |  Actual Points:
Parent ID:                |         Points:
 Reviewer:  asn           |        Sponsor:
--------------------------+------------------------------------

Comment (by traumschule):

 Replying to [comment:3 dgoulet]:
 > I've stated that theory in part in #27410 but now I do think it is
 indeed the code path creating this issue:
 >
 > 1. Tor gets a descriptor and tries to connect to it but intro points all
 fail so the descriptor ends up unusable that is
 `hs_client_any_intro_points_usable()` returns false.
 >
 > 2. A SOCKS request is made to the .onion and because the intro points
 aren't usuable, a refetch is triggered, see `connection_ap_handle_onion()`
 which put the connection in `AP_CONN_STATE_RENDDESC_WAIT`
 >
 > 3. While (2) is happening, computer goes in suspend mode for more than
 30 minutes.
 >
 > 4. When waking up, the `rend_cache_failure_clean()` callback is
 triggered which makes all the intro point usable all the sudden.
 >
 > 5. Everything is stalled until we get a live consensus which, when it
 arrives, the HS subsystem is informed with `hs_client_dir_info_changed()`
 that calls `retry_all_socks_conn_waiting_for_desc()` (from the stacktrace
 we have in the ticket).
 >
 > 6. And boom, for `hs_client_refetch_hsdesc()` to return
 `HS_CLIENT_FETCH_HAVE_DESC`, we need a descriptor in the cache with usable
 intro points. We have that because (4) made the intro points usable and
 the descriptor from (1) is still in the cache. The BUG() is hit because we
 still have that pending SOCKS connection that is waiting for its
 descriptor from (2).
 >
 > The solution is that for this particular condition where we do have a
 descriptor in our cache but we have a connection waiting for a descriptor,
 instead of BUG(), we need to mark it as "pending for a circuit" like we do
 in `hs_client_desc_has_arrived()` which will trigger the connection to be
 attached to a circuit and thus the HS dance to start.


 Contrary to above case this one isn't caused by a clock jump but a weak
 connection (#28962) causes Bugs around 00:16:05.000, 00:16:59.000 and
 00:17:22.000.
 {{{can_client_refetch_desc()}}} gives a new hint which additionally
 relates it to #24610 and #23764.
 {{{rend_cache_failure_clean()}}} isn't triggered and
 {{{connection_ap_handle_onion()}}} appears after the first bug at
 00:16:38.000.

 {{{
 Dec 29 00:15:55.000 [info] can_client_refetch_desc(): Can't fetch
 descriptor for service [scrubbed] because we are missing a live consensus.
 Stalling connection.
 ...
 Dec 29 00:15:56.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16132 (a General-purpose client 3-hop
 circuit in state doing handshakes with channel state open).
 Dec 29 00:15:56.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16140 (a Hidden service client:
 Establishing rendezvous point 3-hop circuit in state doing handshakes with
 channel state open).
 ...
 Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16123 (a Hidden service client:
 Connecting to intro point 4-hop circuit in state doing handshakes with
 channel state open).
 Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16137 (a General-purpose client 3-hop
 circuit in state doing handshakes with channel state open).
 Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16143 (a Hidden service client:
 Establishing rendezvous point 3-hop circuit in state doing handshakes with
 channel state open).
 Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16144 (a Hidden service client:
 Establishing rendezvous point 3-hop circuit in state doing handshakes with
 channel state open).
 Dec 29 00:16:01.000 [info] circuit_expire_building(): No circuits are
 opened. Relaxing timeout for circuit 16145 (a Hidden service client:
 Establishing rendezvous point 3-hop circuit in state doing handshakes with
 channel state open).
 ...
 Dec 29 00:16:05.000 [info] can_client_refetch_desc(): We would fetch a v3
 hidden service descriptor but we already have a usable descriptor.
 Dec 29 00:16:05.000 [warn] tor_bug_occurred_(): Bug:
 ../src/feature/hs/hs_client.c:280: retry_all_socks_conn_waiting_for_desc:
 Non-fatal assertion !(status == HS_CLIENT_FETCH_HAVE_DESC) failed. (on Tor
 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug: Non-fatal assertion !(status ==
 HS_CLIENT_FETCH_HAVE_DESC) failed in retry_all_socks_conn_waiting_for_desc
 at ../src/feature/hs/hs_client.c:280. Stack trace: (on Tor 0.4.0.0-alpha-
 dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x65)
 [0x643225] (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xbd)
 [0x63ea4d] (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(hs_client_dir_info_changed+0xad) [0x54581d] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(router_dir_info_changed+0x34) [0x56cc84] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(networkstatus_set_current_consensus+0x3f4) [0x563f14] (on Tor
 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(connection_dir_reached_eof+0xcc0) [0x52b510] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(connection_handle_read+0x808) [0x4aa0b8] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(+0x5a629) [0x4b0629] (on
 Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/lib/i386-linux-
 gnu/libevent-2.0.so.5(event_base_loop+0x7d1) [0xb769f681] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:
 /usr/bin/tor(tor_libevent_run_event_loop+0x30) [0x5df3d0] (on Tor 0.4.0.0
 -alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0xc5)
 [0x4b1a95] (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(run_tor_main_loop+0x1ce)
 [0x49d43e] (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x11c5)
 [0x49e7f5] (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3f) [0x49b97f]
 (on Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(main+0x35) [0x49b4d5] (on
 Tor 0.4.0.0-alpha-dev )
 Dec 29 00:16:05.000 [warn] Bug:     /lib/i386-linux-
 gnu/libc.so.6(__libc_start_main+0xf6) [0xb708d286] (on Tor 0.4.0.0-alpha-
 dev )
 Dec 29 00:16:05.000 [warn] Bug:     /usr/bin/tor(+0x45531) [0x49b531] (on
 Tor 0.4.0.0-alpha-dev )
 ...
 Dec 29 00:16:38.000 [info] connection_ap_handle_onion(): Got a hidden
 service request for ID '[scrubbed]'
 Dec 29 00:16:38.000 [info] rep_hist_note_used_internal(): New port
 prediction added. Will continue predictive circ building for 2851 more
 seconds.
 Dec 29 00:16:38.000 [info] connection_ap_handle_onion(): Descriptor is
 here. Great.
 Dec 29 00:16:38.000 [info] connection_edge_process_inbuf(): data from edge
 while in 'waiting for circuit' state. Leaving it on buffer.
 Dec 29 00:16:38.000 [info] connection_ap_handshake_attach_circuit(): Intro
 circ 3827669459 (id: 16167) present and awaiting ACK. Rend circuit
 3158169261 (id: 16158). Stalling. (stream 100 sec old)
 Dec 29 00:16:38.000 [info] connection_ap_handshake_attach_circuit():
 pending-join circ 3976548034 (id: 16156) already here, with intro ack.
 Stalling. (stream 0 sec old)
 ...
 Dec 29 00:17:55.000 [info] connection_ap_handle_onion(): Found usable
 descriptor in cache for [scrubbed]. Not fetching..
 Dec 29 00:17:55.000 [info] rep_hist_note_used_internal(): New port
 prediction added. Will continue predictive circ building for 2851 more
 seconds.
 Dec 29 00:17:55.000 [info] connection_ap_handle_onion(): Descriptor is
 here. Great.

 }}}

 {{{
 /** Launch a connection to a hidden service directory to fetch a hidden
  * service descriptor using <b>identity_pk</b> to get the necessary keys.
  *
  * A hs_client_fetch_status_t code is returned. */
 int
 hs_client_refetch_hsdesc(const ed25519_public_key_t *identity_pk)
 {
   hs_client_fetch_status_t status;

   tor_assert(identity_pk);

   if (!can_client_refetch_desc(identity_pk, &status)) {
     return status;
   }
 }}}

 {{{
 /* Return true iff the client can fetch a descriptor for this service
 public
  * identity key and status_out if not NULL is untouched. If the client can
  * _not_ fetch the descriptor and if status_out is not NULL, it is set
 with
  * the fetch status code. */
 static unsigned int
 can_client_refetch_desc(const ed25519_public_key_t *identity_pk,
                         hs_client_fetch_status_t *status_out)
 {
 ...
   /* Without a live consensus we can't do any client actions. It is needed
 to
    * compute the hashring for a service. */
   if (!networkstatus_get_live_consensus(approx_time())) {
     log_info(LD_REND, "Can't fetch descriptor for service %s because we "
                       "are missing a live consensus. Stalling
 connection.",
              safe_str_client(ed25519_fmt(identity_pk)));
     status = HS_CLIENT_FETCH_MISSING_INFO;
     goto cannot;
   }
 ...
   /* Check if fetching a desc for this HS is useful to us right now */
   {
     const hs_descriptor_t *cached_desc = NULL;
     cached_desc = hs_cache_lookup_as_client(identity_pk);
     if (cached_desc && hs_client_any_intro_points_usable(identity_pk,
                                                          cached_desc)) {
       log_info(LD_GENERAL, "We would fetch a v3 hidden service descriptor
 "
                            "but we already have a usable descriptor.");
       status = HS_CLIENT_FETCH_HAVE_DESC;
       goto cannot;
     }
   }
 ...
  cannot:
   if (status_out) {
     *status_out = status;
   }
   return 0;
 }}}

 Also nearly hidden and probably unrelated the pathbias Bug (#27228) is
 lingering:
 {{{
 Dec 29 00:16:26.000 [info] pathbias_should_count(): Bug: Circuit 16109 is
 now being counted despite being ignored in the past. Purpose is Measuring
 circuit timeout, path state is new (on Tor 0.4.0.0-alpha-dev )
 }}}

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28669#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Reply via email to