[webkit-dev] Spam and indexing

Alexey Proskuryakov Thu, 28 Mar 2019 14:00:00 -0700

Hello,

The robots.txt file that we have on bugs.webkit.org currently allows search 
engines access to individual bug pages, but not to any bug lists. As a result, 
search engines and the Internet Archive only index bugs that were filed before 
robots.txt changes a few years ago, and bugs that are directly linked from 
webpages elsewhere. These bugs are where most spam content naturally ends up on.


This is quite wrong, as indexing just a subset of bugs is not beneficial to 
anyone other than spammers. So we can go in either direction:

1. Allow indexers to enumerate bugs, thus indexing all of them.

Seems reasonable that people should be able to find bugs using search engines. 
On the other hand, we'll need to do something to ensure that indexers don't 
destroy Bugzilla performance, and of course spammers will love having more 
flexibility.

2. Block indexing completely.

Seems like no one was bothered by lack of indexing on new bugs so far.

Thoughts?


For reference, here is the current robots.txt content:

$ curl https://bugs.webkit.org/robots.txt
User-agent: *
Allow: /index.cgi
Allow: /show_bug.cgi
Disallow: /
Crawl-delay: 20

- Alexey
- Alexey


_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

[webkit-dev] Spam and indexing

Reply via email to