Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-04 Thread Stuart Langley
405 is being returned for these requests anyway. The incoming rate is <1 QPS - beside filling up your logs I'm not sure how, if at all, this is effecting your app. On Friday, 3 August 2012 06:08:21 UTC+10, Kate wrote: > > How can I block the following curl requests. Not every IP is different a

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-02 Thread Kate
How can I block the following curl requests. Not every IP is different and I get 10s of 1000s of them every day. Honestly I do not know HOW to block them. What method/code? 2012-08-02 15:03:21.103 / 405 55ms 0kb curl/7.18.2 (i386-redhat-linux-gnu) libcurl/7.18.2 NSS/3.12.2.0 zlib/1.2.3 libidn/

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-02 Thread Kate
I am having a similar problem and still cannot find an answer. The requests are all curl requests and I have tried everything I can think of. I tried using appengine_config.py and checking for a user agent but that didn't work. All the IP addresses are different. Surely there must be a solution

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-27 Thread Jeff Schnitzer
On Thu, Jul 26, 2012 at 8:45 PM, Drake wrote: > And then when Google Spam team bot shows up you would be delisted... That > would Rock... It's highly improbable that anyone in an official capacity at Google will ever view your page with the exact User-Agent: AppEngine-Google; (+http://code.googl

RE: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Drake
To: google-appengine@googlegroups.com > Subject: Re: [google-appengine] Google App Engine, rogue crawlers, and > PageSpeed Insights > > It would have to be by something at "Layer 7" that understands HTTP. > What web server/technology are you using? With apache you

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
I like how your mind thinks, Jeff :) I did some googling and found the specifics on how to block using apache's mod_rewrite. For the benefit of others, I post it here: Inside your virtual host: RewriteEngine on # start RewriteCond %{HTTP_USER_AGENT} ^AppEngine-Google;.*appid:.*steprep Rew

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Jeff Schnitzer
It would have to be by something at "Layer 7" that understands HTTP. What web server/technology are you using? With apache you can do it with mod_rewrite. Blocking IP addresses is really a clumsy way to do it anyways since GAE urlfetch changes IP ranges periodically. If you really don't like the

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
Thanks, Jeff, but how do I block requests by header and not by IP? I usually use iptables to block the requests, but cannot do so in this situation because then I block access to Google's PageSpeed Insights tool too. On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: > > Ever

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Jeff Schnitzer
Every fetch request from GAE includes the appid as a header... you obviously see it yourself, which is how you know the appid of the crawler. This is how Google enables you to block applications; just block all requests with that particular header. Jeff On Wed, Jul 25, 2012 at 9:35 AM, jswap wr