[sqlalchemy] Re: Small note on reading SA docs

Bobby Impollonia Thu, 21 May 2009 20:04:23 -0700

>  otherwise if you have any advice on how to get 0.4/0.3
> delisted from such a prominent place on Google, that would be
> appreciated.


The simplest thing to do is to append:
Disallow: /docs/04/
Disallow: /docs/03/

to the file:
http://www.sqlalchemy.org/robots.txt

This tells google (and all well-behaved search engines) not to index
those urls (and anything under them). The next time the googlebot
comes through, it will see the new robots.txt and remove those pages
from its index. This will take a couple weeks at most.

You can learn more about robots.txt here:
http://www.robotstxt.org/

The disadvantage to doing it that way is that you will lose the google
juice (pagerank) for inbound links to the old documentation.

An alternative approach that gets around this to use a <link
rel="canonical" ...> tag in the <head> of each page of the 04 and 03
documentation pointing to the corresponding page of 05 documentation
as its "canonical" url.

By doing this, you are claiming that the 04/ 03 documentation pages
are "duplicates" of the corresponding 05 pages. Google juice from
inbound links to an old documentation page will accrue to the
appropriate 05 documentation page instead.

However, strictly speaking, the different versions aren't quite
"duplicates", so you might be pushing the boundaries of what is
allowed a bit by claiming they are.

Here is more info on rel="canonical" from google:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

A similar approach would be to do a 301 redirect from each old
documentation page to the corresponding 05 documentation page, but
only if the visitor is the googlebot. This is straightforward to
implement with mod_rewrite (the googlebot can be recognized by its
user-agent string), but probably a bad idea since google usually
considers it "cloaking" to serve different content to the googlebot
than to regular visitors.

You should also consider submitting an XML sitemap to google via the
google webmaster tools. This allows you to completely spell out for
them the structure of the site and what you want indexed.

I also noticed that your current robots.txt file disallows indexing of
anything under /trac/. It would nice to let google index bugs in trac
so that someone who searches google for sqlalchemy help can come
across an extant bug describing their problem. In addition, you have
links on the front page ("changelog" and "what's new") that go to urls
under /trac/ ,  so google will not follow those links due to your
robots.txt.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Small note on reading SA docs

Reply via email to