Hi Tor-Talk,

I am doing an MSc in Telecommunications and Network at City University, London. 
 For my dissertation I am looking at the limitations of scaling Tor up and how 
the limits could be overcome and should state that I am not a programmer, 
though would love to be involved in Tor’s progress.

I’m not sure if I am confusing Authority Server and Directory Authority and 
Directory Server or if they are all one and the same...

Firstly, in the original Tor documentation (Tor-Design 18/05/2004) initial 
“theoretical” limits were stated that Tor could operate, then three, but as 
many as, up to nine DA’s (Directory Authorities); however I note from the 
documentation you have gone through various version releases; have introduced 
directory caches etc to mitigate the overloading of the DA’s and now have ten 
DA’s operating and overall improved network performance.  

Later (section 8) "Early experiences: Tor in the Wild” states initial 
expectations "of the network to support a few hundred nodes and 10,000 users 
before we’re forced to become more distributed”.  This reference was made to 
the “clique topology” and “full-visibility directories”, yet you now operate 
almost 6,000 relays and around 2.25M users (directly connected).  Have you 
fundamentally changed the topology or have you found gains in the reporting of 
relays for form the consensus (or elsewhere) to allow this scale factor?

Two of the bottle necks identified in dir-spec (section 0.3 Some Remaining 
Questions) are that having every client know about every relay; and to have 
every Directory Cache to know about every router won’t scale ad infinitum. 

A question raised in Tor-Design (section 9) is, "if clients can no longer have 
a complete picture of the network, how can they perform discovery while 
preventing attackers from manipulating or exploiting gaps in their knowledge?”. 
 If the network were to be considered to scale up to significant number of all 
Internet users, could it be that the Directory Authority(Ies) release (to 
Directory Caches and clients) a even random sample of relays/nodes from the 
FULL set of nodes, such that the randomness of the path selection is still 
maintained.  The random selection could be sampled on a per client basis with 
enough of a sample as is currently downloaded (6000 relays).  What this means 
is that each client (or possibly groupings of clients) is getting a different 
“view” of the network, but there would need to be a scaling down from the full 
set to the sample set at some point before the client.  

I have looked over the documentation for the path selection, directory protocol 
and the consensus, but have not documented the timing of the exchanges of 
communications.  I imagine that this is an area that could present a limit if 
scaled up.  What are the current areas that present limitations for large 
scaling up?

I have been able to access most of the relevant documentation through the 
https://www.torproject.org/docs/documentation.html.en but would appreciate it 
if there are any other repositories of info.  As mentioned at the start, I am 
not a programmer so the code base is meaningless to me :(

A small note; it would be useful for the documentation to be dated (and 
reversioner with dates) to indicate the freshness and relevance of the data.  I 
am aware that this may be a resource issue.

I appreciate your support with the network and hope to be able to contribute 
more in the future.

