Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-04 Thread Stuart Langley
405 is being returned for these requests anyway. The incoming rate is 1 QPS - beside filling up your logs I'm not sure how, if at all, this is effecting your app. On Friday, 3 August 2012 06:08:21 UTC+10, Kate wrote: How can I block the following curl requests. Not every IP is different and

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-02 Thread Kate
I am having a similar problem and still cannot find an answer. The requests are all curl requests and I have tried everything I can think of. I tried using appengine_config.py and checking for a user agent but that didn't work. All the IP addresses are different. Surely there must be a

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-08-02 Thread Kate
How can I block the following curl requests. Not every IP is different and I get 10s of 1000s of them every day. Honestly I do not know HOW to block them. What method/code? 2012-08-02 15:03:21.103 / 405 55ms 0kb curl/7.18.2 (i386-redhat-linux-gnu) libcurl/7.18.2 NSS/3.12.2.0 zlib/1.2.3

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-27 Thread Jeff Schnitzer
On Thu, Jul 26, 2012 at 8:45 PM, Drake drak...@digerat.com wrote: And then when Google Spam team bot shows up you would be delisted... That would Rock... It's highly improbable that anyone in an official capacity at Google will ever view your page with the exact User-Agent: AppEngine-Google;

[google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
I run a website containing lots of doctor-related data. We get crawled by rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) and we sometimes see our content show up on other websites. I define a crawler as rogue when it does not obey robots.txt exclusions, and the

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Jeff Schnitzer
Every fetch request from GAE includes the appid as a header... you obviously see it yourself, which is how you know the appid of the crawler. This is how Google enables you to block applications; just block all requests with that particular header. Jeff On Wed, Jul 25, 2012 at 9:35 AM, jswap

[google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
I run a website containing lots of doctor-related data. We get crawled by rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) and we sometimes see our content show up on other websites. I define a crawler as rogue when it does not obey robots.txt exclusions, and the

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
Thanks, Jeff, but how do I block requests by header and not by IP? I usually use iptables to block the requests, but cannot do so in this situation because then I block access to Google's PageSpeed Insights tool too. On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote: Every

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Jeff Schnitzer
It would have to be by something at Layer 7 that understands HTTP. What web server/technology are you using? With apache you can do it with mod_rewrite. Blocking IP addresses is really a clumsy way to do it anyways since GAE urlfetch changes IP ranges periodically. If you really don't like the

Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread jswap
I like how your mind thinks, Jeff :) I did some googling and found the specifics on how to block using apache's mod_rewrite. For the benefit of others, I post it here: Inside your virtual host: RewriteEngine on # start RewriteCond %{HTTP_USER_AGENT} ^AppEngine-Google;.*appid:.*steprep

RE: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights

2012-07-26 Thread Drake
@googlegroups.com Subject: Re: [google-appengine] Google App Engine, rogue crawlers, and PageSpeed Insights It would have to be by something at Layer 7 that understands HTTP. What web server/technology are you using? With apache you can do it with mod_rewrite. Blocking IP addresses is really