Followup: no, raising the listen backlog unfortunately didn't help either. But the good news is that I've finally found the true cause of the problem. It's not Racket's fault wescheme.org's compiler servers "fail" on load spikes: rather, it's Amazon EC2. Specifically, Amazon's Elastic Load Balancer will raise 503 errors even if the servers aren't at capacity. It's documented that the Amazon load balancers will raise 503s on traffic spikes, as their load balancers "warm up".
Here's what they say: --- Elastic Load Balancing Capacity Limits Reached Elastic Load Balancing will likely never reach true capacity limits, but until it scales based on the metrics, there can be periods in which your load balancer will return an HTTP 503 error when it cannot handle any more requests. The load balancers do not try to queue all requests, so if they are at capacity, additional requests will fail. If traffic grows over time, then this behavior works well, but in the case of significant spikes in traffic or in certain load testing scenarios, the traffic may be sent to your load balancer at a rate that increases faster than Elastic Load Balancing can scale to meet it. --- Reference: http://aws.amazon.com/articles/1636185810492479 This is precisely what I've been seeing. I'm mortified; that doesn't sound like "load balancing" to me, but I have to work with what I've got. So thanks Jay, sorry about the false alarm. I'm working around the problem now by modifying the client code to expect 503s. ____________________ Racket Users list: http://lists.racket-lang.org/users