Hi pals, this is an freak error in my mysql queries: when save / edit
an datetime field, all the hyphens are converted in - then this
field can't be saved or edited.
I've checked all the database and app configurations and that's set in
utf-8.
Some idea?
--~--~-~--~~~
Another reason not to use redirects for missing URIs is that you could
mistakenly create what is called a "crawler trap".
A crawler trap are URLs that keep changing but keep producing the same
content. The crawler gets stuck wasting its time download the same
page, because it can't tell by the UR
Most web crawlers won't check a 404, because of the way servers send
Http responses.
When a crawler requests a page that is missing, it first receives the
header response from the request, and it can read the response code,
content-type, and other information. The web crawler can then stop the
do
It may not index a 404, but it still checks the 404. For usability's
sake I'd still prefer to redirect than to send a 404. Although we
were discussing bots, we have to keep the user in mind as well. I
have personally traversed the URL path to see what may be found on
some sites, and if Safari h
> I'd actually say using a permanent redirect (301, I believe) to your
> root (or that controller's index), rather than to the 404 page might
> be a better solution. If your users/visitors won't see it since
> you're not linking to it, it isn't really a bad solution, and I doubt
> you'd want any
I'd actually say using a permanent redirect (301, I believe) to your
root (or that controller's index), rather than to the 404 page might
be a better solution. If your users/visitors won't see it since
you're not linking to it, it isn't really a bad solution, and I doubt
you'd want any search eng
Hi Mike,
If your using Apache it has some features in the htaccess file that
will allow you to disable access to your server for bots causing you
trouble.
In your Cake 404 display page keep track of the number of times a 404
is generated per IP address, and if it exceeds a threshold log that IP
Thank you Matthew - I log it everytime before throwing the 404 and I
figured whatever was creating these things would stop - but it
continues. I'm so dadgum anal obsessive it just kills me - hard to
ignore...
It is not coming from any 'known' bot either...
--~--~-~--~~~---
Great advice mathew... Yes... i think that this is the way to go... point
all /controller/action which dont mean any thing without an extra id to
404... once the crawler sees this 404 it would never try to fetch the same
thing again.
Thanks.
On Sat, Nov 1, 2008 at 6:21 AM, Mathew <[EMAIL PROTECTE
Hi Mike,
Disallowing that in your robots.txt is a waste of time.
The robots.txt file was started by Google, and is not an officially
supported feature of all crawlers. So they don't have to follow it,
and I can tell you this doesn't sound like the google bot anyway,
because that bot doesn't gene
So you're saying the search bots are just walking all my actions as if
they are subdirs on a site? Not sure about this.
Maybe I should disallow those specific requests with robots.txt? Any
other cakers have an opinion on this? If I disallow
www.mydomain.com/controller/action/ wont the bots stop
I'm totally no expert on this, but I'd guess that the bots are simply
trying to walk the tree.
If "http://mysite.com/directory/subdirectory/subsubdirectory"; is
valid, then "http://mysite.com/directory/subdirectory";,
"http://mysite.com/directory
" and "http://mysite.com"; are probably also
In a general CMS app written in CakePHP I am noticing in my logs
invalid queries being generated by various search engine bots
including Google, Inktomi, and Yahoo.
What I'm wondering is WHY?
For example they are requesting
http://mysite.com/controller/view instead of the correct
http://mysite
13 matches
Mail list logo