I don't see why not -- having a fast proxy seems like the best thing to do given 8 slower instances behind it. Also, you might look into HAProxy, as I hear it does arbitrary TCP load-balancing as well as specific HTTP balancing.
On Thu, Apr 23, 2009 at 9:01 PM, Brian Hammond <[email protected]>wrote: > Hi David, > > I've been working on a completely different project for the past few weeks. > I'm now getting back into this. > > The Python THttpServer implementation might be a good starting point >> for you in terms of the nuts and bolts of connecting your server to >> Thrift. I would *not* recommend using it for production use (I use >> it as a mock backend for some integration tests) for performance reasons. >> > > > Right, I wouldn't expect *one* of the THttpServer instances to perform well > -- too much of a funnel. However, this made me think that it might be > worthwhile to load-balance a number of them. > > I setup nginx with 4 worker processes (one per core) as a load balancer to > 8 (arbitrary) python processes. These upstream processes are -- at first > stab (no Thrift yet) -- just running a BaseHTTPServer do_GET that returns > "hello world". Nginx simply does round-robin between the 8 upstream > processes. > > I figured this would be a good way to test if THttpServer would perform > well enough for my purposes since THttpServer.RequestHandler is based on > BaseHTTPServer. > > Over loopback: > > $ ab -n 20000 -c 1000 127.0.0.1/index.html > > ... > Requests per second: 11644.32 [#/sec] (mean) > ... > > From my laptop here in NY to my server in The Planet (Dallas, TX): > > $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html > > ... > Requests per second: 788.20 [#/sec] (mean) > ... > > I'm pretty happy with these numbers but of course the upstream processes do > nothing interesting. My data-store is redis [1] however which is extremely > efficient given its nature (an in-memory key-value "database"). Thus, I > don't expect much overhead from thrift or redis. But, I'll test this > assumption of course. > > Sorry if this is obvious to a lot of you on this list. This might be > useful to others getting started. > > Does anyone see any huge glaring problem with the idea of putting fast > nginx in front of a number of "slow" THttpServer-based processes? > > Thanks, > Brian > > On Apr 3, 2009, at 12:59 AM, David Reiss wrote: > > >> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71 >> >> The Python THttpServer implementation might be a good starting point >> for you in terms of the nuts and bolts of connecting your server to >> Thrift. I would *not* recommend using it for production use (I use >> it as a mock backend for some integration tests) for performance reasons. >> In order to avoid having a Thrift thread blocked on >> over-the-net-to-a-poorly-connected-client I/O, I would suggest using >> a server that will buffer up the whole request, then hand it to Thrift, >> then buffer up the Thrift response, then, send the response to the client. >> You probably want to put the POST data in a TMemoryBuffer (not a >> TBufferedTransport, which uses a fixed-size buffer). >> >> --David >> >> Brian Hammond wrote: >> >>> HI Garrett, >>> >>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote: >>> >>> ----- "Brian Hammond" <[email protected]> wrote: >>>> >>>>> What I'm curious about is how I can do all of the following: >>>>> >>>>> 1) use SSL to encrypt user credentials >>>>> 2) write my service implementation in python >>>>> >>>>> I guess there's a few options for python but none completely solve >>>>> both of these requirements. >>>>> >>>>> 1) use the Twisted python generator and run a daemon with twistd >>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in support >>>>> for decoding HTTP / HTTPS requests as Thrift RPCs. >>>>> >>>> Unless you need an asynchronous server side framework for high >>>> concurrency and low memory footprint, I would stay clear of Twisted. >>>> >>> >>> It turns out that I need a highly efficient server. I'm a one-man >>> shop and am limited in the number of servers I can afford to deploy. >>> I plan on starting with a bare minimum of two load-balanced VPS >>> instances so memory is tight. I do also need high concurrency. I'm >>> developing a turn-based game server and have a very large user base >>> already (iPhone app) and would like to license my solution to other >>> similar iPhone developers ... of course I can enlarge my cluster of >>> servers linearly with the number of licensees. I digress... >>> >>> I think a standard threaded wsgi server would work fine. >>>> >>> >>> Suggestions? CherryPy? >>> >>> If you're inclined to use a mod_wsgi, I recommend Graham Dumpleton's >>>> outstanding wsgi implementation for Apache. The Nginx wsgi interface >>>> is good as well, but beware if your app needs to block -- you'll be >>>> serializing your requests. >>>> >>> >>> True. Nginx is indeed single-threaded. I'm not leaning in any way to >>> any particular serving tech. at this point actually. I just want to >>> ensure that whatever tech. I choose is as efficient as possible. >>> >>> I actually don't have any points of blocking in the front-end >>> actually, not on disk I/O at least. My datastore is a file-backed key- >>> value database that runs in a separate process and writes to disk on >>> every Nth database modification. >>> >>> Both options would let you run SSL as well as handle basic or digest >>>> auth. >>>> >>> >>> True. >>> >>> As far as tying in Thrift, I haven't done this myself and >>>> unfortunately can't offer much. Hopefully there are others here who >>>> can. As you've already suggested, taking a look at the RPC layer and >>>> seeing how you can tie it into the backend from wsgi is a start. >>>> >>> >>> Yeah, that's what I gather. I'll play with it over the weekend. >>> >>> IMO, the lack of a security story for Thrift is a weakness. I'm not >>>> sure what discussions there have been to address this. I started to >>>> implement SSL support for Java and Python, but found I had to modify >>>> a fair amount of Thrift code and ended up punting by using stunnel to >>>> setup a secure connection between client and server. You might find >>>> this the path of least resistance as well, in particular if you can >>>> add >>>> the authentication layer to your Thrift IDL. >>>> >>> >>> Yeah, built-in SSL support would be nice. >>> >>> My client will be running on an iPhone -- no stunnel. Oh, yeah, I >>> should mention that it seems most people use Thrift for talking from >>> say their web server to *internal* web services but I'm planning on >>> using it as a public-facing web service, like the EverNote folks are. >>> It was actually good to see another instance of someone planning on >>> using Thrift this way. >>> >>> As one other approach, you can use a symmetric key to sign a request >>>> and send the signature in the clear with the rest of your thrift data. >>>> As long as you keep the signing key secret, this would let you >>>> validate >>>> the origin and integrity of the request. If there's anything sensitive >>>> in the request itself, though, this is no good. >>>> >>> >>> Right. I cannot really trust the client -- iPhone apps are getting >>> cracked left and right. Once cracked, someone will poke around enough >>> in the binary to find out my secret symmetric key even if not stored >>> as a literal string. >>> >>> Thus, I want to use SSL for anything sensitive. >>> >>> I'll create the equivalent of an auth token (same idea as login >>> cookies) with opaque data encrypted using a symmetric key only >>> available on the service-side. The client will send back the auth >>> token with each Thrift RPC. There's a lot more to this to fight >>> replay attacks, client spoofing, etc. but that isn't relevant here. >>> >>> I need to be able to register a user account from the client (I know, >>> spammers will try to automate that but I have countermeasures) and >>> login the user as well. This requires sending the sensitive user >>> information which, while essentially obfuscated to eavesdroppers by >>> virtue of using a binary protocol, can be reverse engineered easily >>> enough I bet. >>> >>> Alas, message signing is another application layer measure -- it would >>>> be sweet to see auth work its way into the Thrift spec. >>>> >>> >>> Yeah, I'm planning on requiring signatures ala Amazon Web Services. >>> Some data used in the request signature calculation will only be >>> available to the client and the service and never transmitted between >>> them in the clear -- it would be transmitted to the client during a >>> login over HTTPS. >>> >>> Auth in Thrift would be wonderful but I wonder if that's feature creep? >>> >>> Good luck! >>>> >>>> Garrett >>>> >>> >>> Thanks! >>> Brian >>> >>> >
