Hi,

Turtl is an HTTP proxy whose purpose is to throttle connections to
specific hostnames to avoid breaking terms of usage of those API
providers (like del.icio.us, technorati and so on).

At the core of turtl is a throttling deferred that works in a similar way as 
DeferredSemaphore() except that it will enforce also a rate (N calls every M 
seconds) at which deferreds added to it are fired.

In the past few weeks it's been improved a couple obscure bugs have been ironed 
out. It's been running as a proxy for a couple of years and recently we started 
using it as a crawler rate limiter.

Source code lives on bitbucket: https://bitbucket.org/adroll/turtl/overview

Here's a small example of its usage:

import time
from twisted.internet import reactor, defer
from twisted.protocols.policies import WrappingFactory
from twisted.web import client, server, resource
from turtl import engine

throttle = engine.ThrottlingDeferred(concurrency=1, calls=2, interval=1)

class FakeResource(resource.Resource):
    isLeaf = True
    def render(self, request):
        return "hello"

def setupServer():
    site = server.Site(FakeResource())
    wrapper = WrappingFactory(site)
    port = reactor.listenTCP(0, wrapper, interface="127.0.0.1")
    portno = port.getHost().port
    return portno

def stop(_):
    return reactor.stop()

def makeUrl(port):
    return "http://localhost:%s/"; % (port)

def prinl(page):
    print time.time(), page

port = setupServer()
url = makeUrl(port)
defer.DeferredList([throttle.run(client.getPage, url).addBoth(prinl) for i in 
xrange(1000)]).addBoth(stop)
reactor.run()


-- 
Valentino Volonghi
http://www.adroll.com


_______________________________________________
Twisted-Python mailing list
[email protected]
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to