Lucas_Werkmeister_WMDE created this task. Lucas_Werkmeister_WMDE added projects: Wikidata, Wikidata Analytics, Wikidata Lexicographical data.
TASK DESCRIPTION As a Wikidata product manager, I want our analytics to be accurate. As a Wikidata power user, I want queries using `pp_sortkey` to return the correct data. **Problem:** Since April 2022, the page props `wb-claims`, `wbl-forms` and `wbl-senses` often have the `pp_sortkey` set to `NULL` in the `page_props` database table. This means that queries using the sort key, rather than the `pp_value` (to optimize the query by using a covering index <https://github.com/wikimedia/analytics-wmde-scripts/commit/979c7a0e3465b1dae187525b0fc85a9d4f0a29f5>), are returning the wrong result. **Example:** mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM page_props WHERE pp_propname IN ('wbl-forms', 'wbl-senses') AND pp_sortkey IS NULL; +----------+ | COUNT(*) | +----------+ | 1150039 | +----------+ 1 row in set (0.687 sec) mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM page_props WHERE pp_propname IN ('wbl-forms', 'wbl-senses') AND pp_sortkey IS NOT NULL; +----------+ | COUNT(*) | +----------+ | 1234425 | +----------+ 1 row in set (0.729 sec) mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM page_props WHERE pp_propname = 'wb-claims' AND pp_sortkey IS NULL; +----------+ | COUNT(*) | +----------+ | 946316 | +----------+ 1 row in set (0.812 sec) mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM page_props WHERE pp_propname = 'wb-claims' AND pp_sortkey IS NOT NULL; +-----------+ | COUNT(*) | +-----------+ | 106516430 | +-----------+ 1 row in set (1 min 40.897 sec) This appears to have been caused by Page properties should always be strings <https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseLexeme/+/776016> (T305158 <https://phabricator.wikimedia.org/T305158>) – `PagePropsTable::getPropertySortKeyValue()` returns `null` for strings, even if the strings are numeric. (The corresponding Wikibase change <https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/775901> was thankfully never merged.) **Screenshots/mockups:** This is likely responsible for the reported drop of senses and forms in Grafana <https://grafana.wikimedia.org/d/000000167/wikidata-datamodel?orgId=1&refresh=30m&from=now-2y&to=now&viewPanel=3>: F41425857: image.png <https://phabricator.wikimedia.org/F41425857> F41425859: image.png <https://phabricator.wikimedia.org/F41425859> **BDD** GIVEN a new AND WHEN a lexeme is created or edited AND its page props have been written THEN the `pp_sortkey` for `pp_propname='wbl-senses'` is non-null AND the `pp_sortkey` for `pp_propname='wbl-forms'` is non-null **Acceptance criteria:** - The `pp_sortkey` is populated for new edits - We repopulate the `pp_sortkey` for all affected lexemes **Open questions:** TASK DETAIL https://phabricator.wikimedia.org/T350224 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lucas_Werkmeister_WMDE, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org