Hello everyone, I'm a long time user of Tor, first time poster here.
Over the last few months, I have been working on a light weight C++ client only
implementation of the Tor protocol, intended to be used as an embedded library
in other applications. It is now at the stage where it can complete
bootstrapping, build circuits, as well as connect to services / host hidden
services (v3). I have been building this primarily off the spec documents
(which have generally been extremely helpful), and well as the assistance of
stepping through the official Tor implementation when needed for
troubleshooting and to confirm specifics. Before I release this to the wider
world, I'd like to confirm a few points that may not be explicitly stated in
the specs.
1. In general, what are the things to look out for when implementing the Tor
protocol beyond "making it work" and validating all data (signatures,
timestamps, etc)? One thing I'm concerned about is the risk of fingerprinting
where the spec does not completely specify behaviour, e.g. the order in which
link specifiers are passed in an extend cell, exact criteria for when circuits
are explicitly destroyed etc. (I'm very excited to see the proposals around
CBOR on this point which would help greatly with knowing that a canonical data
representation was used).
2. When it comes to bootstrapping, the official implementation appears to
favour accessing directories via plaintext HTTP rather than connecting on the
OR port and using create fast / begin dir. What is the motivation for using the
plaintext option (and for that matter, having a plaintext http service open at
all)?. While the OR will learn just as much about the client regardless, it
seems like the default plaintext access to directory information unnecessarily
gives away details of how clients engage with the Tor network to third parties.
3. When using bridges and in particular pluggable transports, how is the client
intended to safely bootstrap in the cold start case where it does not know up
front which bridge/relay it will be connected to (e.g. when using Snowflake)?
The RSA identity can be accepted in blind faith based on the Tor handshake, and
it's then possible to get the full details with create fast / begin dir, but
how does a client know that it has been connected to a bridge that is "blessed"
by the Tor network rather than a MITM actor?
4. Finally, if anyone reading has been involved with or close to the
development of other unofficial Tor implementations, what are the lessons
learned on this front? I'm aware of among others Orchid (updated last in 2016),
node-Tor (does not implement ECC) and torpy (does not implement hidden services
v3). What makes these fail / stall?
Many thanks,
P
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev