#30921: hs-v3: Close intro circuits when cleaning up the client descriptor cache
--------------------------------+------------------------------------
 Reporter:  dgoulet             |          Owner:  dgoulet
     Type:  defect              |         Status:  needs_revision
 Priority:  Medium              |      Milestone:  Tor: 0.4.2.x-final
Component:  Core Tor/Tor        |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:  tor-hs, tor-client  |  Actual Points:  0.1
Parent ID:  #28970              |         Points:  0.1
 Reviewer:  asn                 |        Sponsor:  Sponsor27-must
--------------------------------+------------------------------------

Comment (by dgoulet):

 Replying to [comment:6 asn]:
 > Replying to [comment:5 dgoulet]:
 > > Replying to [comment:4 asn]:
 > > > Hmm, some questions about this fix:
 > > >
 > > > - I'm not sure if the identified race condition is the cause of this
 assert. Looking at #28970, there are two asserts and the second one is on
 `hs_ident_intro_circ_is_valid()` which imples that the hs_ident exists but
 is zeroed out. IIUC, this seems to be the reason that we can't find a
 descriptor here, and not a race condition.
 > >
 > > I would argue that the "!desc" assert could be caused by what I
 outlined and fixed in the patch. The "has opened" code path can't return
 an error so after that assert is hit, we end up trying to send an
 INTRODUCE1 on the circuit and then you hit the second assert.
 > >
 > > The reason why the `intro_auth_pk` key is zeroed is because the
 `setup_intro_circ_auth_key()` failed before it could be filled (by not
 finding the descriptor).
 > >
 >
 > ACK agreed.
 >
 > One last thing: Can you please outline the exact steps to enter this
 edge case? IIUC, it's about a client who gets their intro circ opened at
 the exact same second that their descriptor expired? This seems extremely
 rare to me, and I wonder if there is a simpler explanation.
 >
 > In particular, descriptors expire at "the start of the next time
 period", and the logs in question were at 05:31, so not sure how expiry
 could work this way.

 Well the callback for cleaning the client cache is every 30 minutes. So at
 `05:31`, we might have simply gotten the consensus earlier that
 invalidates the client cache since the "valid_after" time is used. Also I
 have no idea what is the timezone of that log :S...

 The use case is that when the client cache expires a descriptor, it could
 be in the same main loop that an introduction circuit is now opened.

 It is very rare indeed! This is why I think I've never seen that and only
 active clients let say trying to connect to the HS every 30 seconds might
 have more chance to hit it. Considering thousands of tor users out there
 with very different use case, it is a case that I can see happening but
 very rarely.

 This is also why, for this family of bugs, we decided to not backport
 because it is so rare.

 Nevertheless, when cleaning descriptor from the client cache, I think it
 is correct behavior to then always close any pending intro circuits.
 Exactly what we did when we *replace* an existing descriptor.

 Also, re-reading this ticket, this appears to be very wrong and imo we
 should address this...:

 {{{
 The "has opened" code path can't return an error so after that assert is
 hit, we end up trying to send an INTRODUCE1 on the circuit and then you
 hit the second assert.
 }}}

 If the subsystem says "well your has_opened is not working for me", it is
 saying that the circuit won't work so we should stop right there and close
 it. Going into the "sending a cell code path" is a bit crazy imo.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30921#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Reply via email to