[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
Nemo_bis added a comment. In T144592#3038988, @Nemo_bis wrote: Sorry for triple message... do I see correctly (https://archive.fo/MakpZ ) that currently https://cy.wikipedia.org/wiki/Arbennig:AboutTopic/Q272 is the only URL actually indexed by Google? Now searching the localised special page name: https://duckduckgo.com/?q="Arbennig%3AAm_y_Pwnc"+site%3Acy.wikipedia.org DuckDuckGo shows a few mostly-welsh results: https://archive.fo/iRyCb Google picks up results which are mostly in English such as (2nd for me): Wanfried agreement - Wicipedia https://cy.wikipedia.org/wiki/Arbennig:Am_y_Pwnc/Q1441 treaty transferring territory between the United States and Soviet occupation zones of Germany after World War II. Karte Wanfrieder Abkommen.png https://archive.fo/SVu8tTASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, Nemo_bisCc: thiemowmde, Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, GoranSMilovanovic, QZanden, cmadeo, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
hoo added a comment. In T144592#3039068, @thiemowmde wrote: If you click to show "duplicate" search results, you find that Google tries to index URLs like https://cy.wikipedia.org/wiki.phtml?title=Special:AboutTopic/Q2050, but can't because of https://cy.wikipedia.org/robots.txt it says, but I can not track down the rule. The problem here is not that it can't index these URLs. This is fine. The problem is: How does it even find these weird URLs? I gave some of these to google in order to experiment a bit. These should not be ranked highly and wont appear in any real-world searches. I guess it will take another weeks until Google and other search engines start picking up the other placeholders :/TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: thiemowmde, Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
thiemowmde added a comment. Yes, this is still the only article for now: https://www.google.com/search?q=site:cy.wikipedia.org+inurl:AboutTopic If you click to show "duplicate" search results, you find that Google tries to index URLs like https://cy.wikipedia.org/wiki.phtml?title=Special:AboutTopic/Q2050, but can't because of https://cy.wikipedia.org/robots.txt it says, but I can not track down the rule. The problem here is not that it can't index these URLs. This is fine. The problem is: How does it even find these weird URLs?TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, thiemowmdeCc: thiemowmde, Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
Nemo_bis added a comment. Sorry for triple message... do I see correctly (https://archive.fo/MakpZ ) that currently https://cy.wikipedia.org/wiki/Arbennig:AboutTopic/Q272 is the only URL actually indexed by Google?TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, Nemo_bisCc: Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
Nemo_bis added a comment. I note however that https://www.mediawiki.org/w/index.php?diff=2373589 contradicts the task description, since it says «all placeholders for Items that have an id up Q3000» (bold added).TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, Nemo_bisCc: Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
Nemo_bis added a comment. Thanks for updating the task description.TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, Nemo_bisCc: Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-02-07T14:12:14Z] Synchronized wmf-config/: Search index article placeholders on cywiki up to Q2794 (T144592) (duration: 00m 42s)TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, StashbotCc: Stashbot, gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, Th3d3v1ls, Ramalepe, Liugev6, Lewizho99, Maathavan, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
gerritbot added a comment. Change 336225 merged by jenkins-bot: Search index article placeholders on cywiki up to Q2794 https://gerrit.wikimedia.org/r/336225TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hoo, gerritbotCc: gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, Th3d3v1ls, Ramalepe, Liugev6, Lewizho99, Maathavan, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
gerritbot added a comment. Change 336225 had a related patch set uploaded (by Hoo man): Search index article placeholders up to Q2794 https://gerrit.wikimedia.org/r/336225TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T144592: Search index a limited number of article placeholders on cywiki for testing and evaluation purposes
hoo added a comment. Concrete query used: SELECT page_title FROM page INNER JOIN wb_entity_per_page ON epp_page_id = page_id INNER JOIN page_props AS pp_sl ON pp_sl.pp_page = page_id AND pp_sl.pp_propname = 'wb-sitelinks' INNER JOIN page_props AS pp_st ON pp_st.pp_page_id AND pp_st.pp_propname = 'wb-claims' WHERE pp_st.pp_value > 2 AND pp_sl.pp_value > 3 AND NOT EXISTS(SELECT 1 FROM wb_items_per_site WHERE ips_site_id = 'cywiki' AND ips_item_id = epp_entity_id) ORDER BY epp_entity_id ASC LIMIT 1000; Results (indexable user page): https://cy.wikipedia.org/wiki/Defnyddiwr:Hoo_man/T144592-placeholders Note: The placeholders themselves are not indexable, yet.TASK DETAILhttps://phabricator.wikimedia.org/T144592EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: Deskana, Izno, Lydia_Pintscher, Aklapper, Lucie, Ricordisamoa, Nemo_bis, DarTar, MZMcBride, hoo, D3r1ck01, Wikidata-bugs, aude, jayvdb, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs