[Wikidata-bugs] [Maniphest] [Commented On] T179312: robots.txt prevents indexing of Special:EntityData

2017-11-13 Thread thiemowmde
thiemowmde added a comment.
There are canonical URLs on all these pages, e.g. . But this will not stop Google from indexing redirects. We do provide these redirects. They exist. They are real. Users are allowed to use them.

What we see on search result pages like https://www.google.com/search?q=site:wikidata.org+inurl:entity is entirely normal behavior. I would like to understand what people think needs "fixing" there.TASK DETAILhttps://phabricator.wikimedia.org/T179312EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: thiemowmdeCc: thiemowmde, hoo, daniel, aude, Aklapper, Lydia_Pintscher, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179312: robots.txt prevents indexing of Special:EntityData

2017-11-13 Thread hoo
hoo added a comment.

In T179312#3754286, @thiemowmde wrote:
If the search result in the screenshot is the only reason this ticket was created, I strongly suggest to close it, because stripping duplicates from search indexes is actually intended.


Which should be done by setting a canonical URL, though… I suppose?TASK DETAILhttps://phabricator.wikimedia.org/T179312EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: thiemowmde, hoo, daniel, aude, Aklapper, Lydia_Pintscher, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179312: robots.txt prevents indexing of Special:EntityData

2017-10-31 Thread hoo
hoo added a comment.
Well, we could allow this, I guess… but we should at least set a canonical URL (or one per output?) as header (we can't put it in the html here, as there's none).

This is probably interesting especially as we already put the various EntityData URLs in to our regular URLs as .TASK DETAILhttps://phabricator.wikimedia.org/T179312EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: hoo, daniel, aude, Aklapper, Lydia_Pintscher, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179312: robots.txt prevents indexing of Special:EntityData

2017-10-30 Thread daniel
daniel added a comment.
robots.txt is controlled by the WMF. Special pages are listed there because Special pages generally contain dynamic data, and should not be cached.

Special:EntityData could be indexable, but it's a bit awkward. Depending on the request (particularly, the query string and Accept header), it may produce JSON or RDF, or a redirect to the regular HTML page. I think it would be fine to allow crawlers to index these, but I also see little added value in doing so.TASK DETAILhttps://phabricator.wikimedia.org/T179312EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: hoo, daniel, aude, Aklapper, Lydia_Pintscher, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs