> On Mar 28, 2019, at 2:10 PM, Konstantin Tokarev <annu...@yandex.ru> wrote: > > > > 28.03.2019, 23:58, "Alexey Proskuryakov" <a...@webkit.org > <mailto:a...@webkit.org>>: >> Hello, >> >> The robots.txt file that we have on bugs.webkit.org >> <http://bugs.webkit.org/> currently allows search engines access to >> individual bug pages, but not to any bug lists. As a result, search engines >> and the Internet Archive only index bugs that were filed before robots.txt >> changes a few years ago, and bugs that are directly linked from webpages >> elsewhere. These bugs are where most spam content naturally ends up on. >> >> This is quite wrong, as indexing just a subset of bugs is not beneficial to >> anyone other than spammers. So we can go in either direction: >> >> 1. Allow indexers to enumerate bugs, thus indexing all of them. >> >> Seems reasonable that people should be able to find bugs using search >> engines. > > Yes, and it may give better result even than searching bugzilla directly > >> On the other hand, we'll need to do something to ensure that indexers don't >> destroy Bugzilla performance, > > This can be solved by caching > >> and of course spammers will love having more flexibility. > > rel="nofollow" on all links in comments should be enough to make spamming > useless
Theoretically yes… but a couple google searches say it doesn’t make a difference. Here is one of many https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html <https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html> I expect that spammers don’t reply care if they get a nofollow or not, they are mostly un-manned scripts anyway. I’m not opposed to adding this, I just don’t expect it will solve the problem. We could measure and see. Lucas > >> >> 2. Block indexing completely. >> >> Seems like no one was bothered by lack of indexing on new bugs so far. > > That's survival bias - if nobody can find relevant bugs, nobody will ever > complain > >> >> Thoughts? >> >> For reference, here is the current robots.txt content: >> >> $ curl https://bugs.webkit.org/robots.txt >> User-agent: * >> Allow: /index.cgi >> Allow: /show_bug.cgi >> Disallow: / >> Crawl-delay: 20 >> >> - Alexey >> - Alexey >> >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> https://lists.webkit.org/mailman/listinfo/webkit-dev > > -- > Regards, > Konstantin > > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org <mailto:webkit-dev@lists.webkit.org> > https://lists.webkit.org/mailman/listinfo/webkit-dev > <https://lists.webkit.org/mailman/listinfo/webkit-dev>
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev