This is a lot of really good advice. Thanks. For some reason, I was
thinking C++ would give a measurable performance increase for the spider,
but after having questioned that it seems really dumb. Obviously the
network will be the bottleneck by far. I think I'll still use C++ for the
back end
Google how to use Socks5 with boost and set 127.0.0.1:9050 as proxy.
On Sep 30, 2015 5:14 AM, "Tyler Hardin" wrote:
> Hi, I'm writing a spider in C++ and thinking about running it on the Tor
> hidden network. I'm using boost::asio for the network API. What would be
> the
Asio is only a socket library which means you would need to build all the
Http logic on top of it, which is not very fun but everything you need to
know is documented in RFCs if you really want to go down that route.
The "best/ easiest" way would be to use a http library specifically for the
Hi, I'm writing a spider in C++ and thinking about running it on the Tor
hidden network. I'm using boost::asio for the network API. What would be
the best/easiest way for me to retrieve pages from the Tor hidden network?
P.S. Yes, I'm respecting robots.txt and rate limiting. I'm not going to DOS
Also, about rate limiting, what sort of rate limit do y'all think would be
mindful of the health of the network and the average site? I'm thinking a
maximum of 1 req per second per site and 10 reqs per second overall (in
other words, 10 sites per second). But based on my experience browsing with
On Tue, 29 Sep 2015 23:22:37 +, Tyler Hardin wrote:
> Also, about rate limiting, what sort of rate limit do y'all think would be
> mindful of the health of the network and the average site? I'm thinking a
> maximum of 1 req per second per site and 10 reqs per second overall
Perhaps you should