So, now in 2015, is it still necessary to block some bots and some
URLs or should everything be opened or should this bug be closed or...?
Just a ping :-).
--
Ivan Baldo - iba...@adinet.com.uy - http://ibaldo.codigolibre.net/
From Montevideo, Uruguay, at the south of South America.
So right now Google is allowed to spider bugs.debian.org, but other
search engines are not. Sounds discriminating.
Perhaps it could be extracted from web server logs to see how much load
does the Googlebot make?
If the numbers are not very significant, other spiders could be allowed,
On Thu, 10 Jan 2008, Anthony Towns wrote:
(In practice, with google barely indexing anything in the BTS yet; lookup
for bug#459818 by googling for `medium dhclient-script' works fine;
using hyperstraier on merkel takes ages and doesn't return any hits)
That's because you actually meant to
On Wed, 09 Jan 2008, Don Armstrong wrote:
On Thu, 10 Jan 2008, Anthony Towns wrote:
I've made those changes on rietz directly; what's the procedure
for committing them? sudo -u debbugs -H bzr commit ? There was a
pre-existing change in pkgreport.cgi (adding a^ to the Go away
regexp) that
On Thu, Jan 03, 2008 at 01:07:15PM -0800, Don Armstrong wrote:
There are already mirrors which allow indexing, and you can use the
BTS's own search engine which is far superior to gooogle [...]
Uh, you're kidding right? The BTS's own search engine won't turn up hits
outside the BTS, as a
On Wed, Jan 09, 2008 at 05:58:34PM +1000, Anthony Towns wrote:
Getting smarturl.cgi properly done is still probably the real solution.
Okay, so I've made smaturl.cgi work again; it was broken by:
- Debbugs::CGI not accepting params from ARGV (smarturl.cgi changed
to set QUERY_STRING)
On Wed, 09 Jan 2008, Anthony Towns wrote:
On Thu, Jan 03, 2008 at 01:07:15PM -0800, Don Armstrong wrote:
There are already mirrors which allow indexing, and you can use the
BTS's own search engine which is far superior to gooogle [...]
Uh, you're kidding right? The BTS's own search engine
On Thu, 10 Jan 2008, Anthony Towns wrote:
On Wed, Jan 09, 2008 at 05:58:34PM +1000, Anthony Towns wrote:
Getting smarturl.cgi properly done is still probably the real solution.
Okay, so I've made smaturl.cgi work again; it was broken by:
- Debbugs::CGI not accepting params from ARGV
On Wed, Jan 09, 2008 at 12:54:32PM -0800, Don Armstrong wrote:
On Wed, 09 Jan 2008, Anthony Towns wrote:
Uh, you're kidding right? The BTS's own search engine won't turn up hits
outside the BTS, as a trivial example...
It's far superior to google for searching for results *in* the BTS.
On Wed, Jan 09, 2008 at 05:58:34PM +1000, Anthony Towns wrote:
Disallow: /*/ # exclude everything but the shortcuts
Allow: /cgi-bin/bugreport.cgi?bug=
Allow: /cgi-bin/pkgreport.cgi?pkg=*;dist=unstable$
I've set that up on rietz for Googlebot, we'll see if it works ok. I
2008/1/3, Don Armstrong [EMAIL PROTECTED] wrote:
On Thu, 03 Jan 2008, Jason Spiro wrote:
http://en.wikipedia.org/wiki/Robots.txt#Crawl-delay_directive will
help. Yahoo and MSNBot both support it. I bet other major bots
support it too. So we can allow Yahoo and MSNBot (plus Googlebot, if
Package: www.debian.org
Severity: wishlist
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Cheers,
--
Jason Spiro: corporate trainer, web developer, IT consultant.
I support Linux, UNIX, Windows, and more.
reassign 458939 bugs.debian.org
thanks
On Thu, Jan 03, 2008 at 07:40:12PM +, Jason Spiro wrote:
Package: www.debian.org
Severity: wishlist
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Hello, the
On Thu, 03 Jan 2008, Jason Spiro wrote:
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Just for the record, the reasons why we disallow indexing are because
the robots.txt specification isn't complete enough
On Thu, Jan 03, 2008 at 01:07:15PM -0800, Don Armstrong wrote:
On Thu, 03 Jan 2008, Jason Spiro wrote:
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Just for the record, the reasons why we disallow
2008/1/3, Don Armstrong [EMAIL PROTECTED] wrote:
On Thu, 03 Jan 2008, Jason Spiro wrote:
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Just for the record, the reasons why we disallow indexing are
On Thu, 03 Jan 2008, Jason Spiro wrote:
http://en.wikipedia.org/wiki/Robots.txt#Crawl-delay_directive will
help. Yahoo and MSNBot both support it. I bet other major bots
support it too. So we can allow Yahoo and MSNBot (plus Googlebot, if
they support it too) and block everyone else.
Google
On Thu, 03 Jan 2008, Jason Spiro wrote:
Package: www.debian.org
Severity: wishlist
Please allow search engines to index http://bugs.debian.org. This can
be done by deleting the file http://bugs.debian.org/robots.txt.
Most of the content is generated dynamically nowadays and this file has
18 matches
Mail list logo