405 is being returned for these requests anyway.
The incoming rate is 1 QPS - beside filling up your logs I'm not sure how,
if at all, this is effecting your app.
On Friday, 3 August 2012 06:08:21 UTC+10, Kate wrote:
How can I block the following curl requests. Not every IP is different and
I am having a similar problem and still cannot find an answer. The requests
are all curl requests and I have tried everything I can think of.
I tried using appengine_config.py and checking for a user agent but that
didn't work. All the IP addresses are different.
Surely there must be a
How can I block the following curl requests. Not every IP is different and
I get 10s of 1000s of them every day.
Honestly I do not know HOW to block them. What method/code?
2012-08-02 15:03:21.103 / 405 55ms 0kb curl/7.18.2 (i386-redhat-linux-gnu)
libcurl/7.18.2 NSS/3.12.2.0 zlib/1.2.3
On Thu, Jul 26, 2012 at 8:45 PM, Drake drak...@digerat.com wrote:
And then when Google Spam team bot shows up you would be delisted... That
would Rock...
It's highly improbable that anyone in an official capacity at Google
will ever view your page with the exact User-Agent:
AppEngine-Google;
I run a website containing lots of doctor-related data. We get crawled by
rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) and
we sometimes see our content show up on other websites. I define a crawler
as rogue when it does not obey robots.txt exclusions, and the
Every fetch request from GAE includes the appid as a header... you
obviously see it yourself, which is how you know the appid of the
crawler. This is how Google enables you to block applications; just
block all requests with that particular header.
Jeff
On Wed, Jul 25, 2012 at 9:35 AM, jswap
I run a website containing lots of doctor-related data. We get crawled by
rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) and
we sometimes see our content show up on other websites. I define a crawler
as rogue when it does not obey robots.txt exclusions, and the
Thanks, Jeff, but how do I block requests by header and not by IP? I
usually use iptables to block the requests, but cannot do so in this
situation because then I block access to Google's PageSpeed Insights tool
too.
On Thursday, July 26, 2012 5:27:27 PM UTC-4, Jeff Schnitzer wrote:
Every
It would have to be by something at Layer 7 that understands HTTP.
What web server/technology are you using? With apache you can do it
with mod_rewrite.
Blocking IP addresses is really a clumsy way to do it anyways since
GAE urlfetch changes IP ranges periodically.
If you really don't like the
I like how your mind thinks, Jeff :)
I did some googling and found the specifics on how to block using apache's
mod_rewrite. For the benefit of others, I post it here:
Inside your virtual host:
RewriteEngine on
# start
RewriteCond %{HTTP_USER_AGENT} ^AppEngine-Google;.*appid:.*steprep
@googlegroups.com
Subject: Re: [google-appengine] Google App Engine, rogue crawlers, and
PageSpeed Insights
It would have to be by something at Layer 7 that understands HTTP.
What web server/technology are you using? With apache you can do it with
mod_rewrite.
Blocking IP addresses is really
11 matches
Mail list logo