Lucas_Werkmeister_WMDE created this task.
Lucas_Werkmeister_WMDE added projects: Wikidata, Wikidata Analytics, Wikidata 
Lexicographical data.

TASK DESCRIPTION
  As a Wikidata product manager, I want our analytics to be accurate.
  As a Wikidata power user, I want queries using `pp_sortkey` to return the 
correct data.
  
  **Problem:**
  Since April 2022, the page props `wb-claims`, `wbl-forms` and `wbl-senses` 
often have the `pp_sortkey` set to `NULL` in the `page_props` database table. 
This means that queries using the sort key, rather than the `pp_value` (to 
optimize the query by using a covering index 
<https://github.com/wikimedia/analytics-wmde-scripts/commit/979c7a0e3465b1dae187525b0fc85a9d4f0a29f5>),
 are returning the wrong result.
  
  **Example:**
  
    mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM 
page_props WHERE pp_propname IN ('wbl-forms', 'wbl-senses') AND pp_sortkey IS 
NULL;
    +----------+
    | COUNT(*) |
    +----------+
    |  1150039 |
    +----------+
    1 row in set (0.687 sec)
    
    mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM 
page_props WHERE pp_propname IN ('wbl-forms', 'wbl-senses') AND pp_sortkey IS 
NOT NULL;
    +----------+
    | COUNT(*) |
    +----------+
    |  1234425 |
    +----------+
    1 row in set (0.729 sec)
    
    mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM 
page_props WHERE pp_propname = 'wb-claims' AND pp_sortkey IS NULL;
    +----------+
    | COUNT(*) |
    +----------+
    |   946316 |
    +----------+
    1 row in set (0.812 sec)
    
    mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> SELECT COUNT(*) FROM 
page_props WHERE pp_propname = 'wb-claims' AND pp_sortkey IS NOT NULL;
    +-----------+
    | COUNT(*)  |
    +-----------+
    | 106516430 |
    +-----------+
    1 row in set (1 min 40.897 sec)
  
  This appears to have been caused by Page properties should always be strings 
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseLexeme/+/776016> 
(T305158 <https://phabricator.wikimedia.org/T305158>) – 
`PagePropsTable::getPropertySortKeyValue()` returns `null` for strings, even if 
the strings are numeric. (The corresponding Wikibase change 
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/775901> was 
thankfully never merged.)
  
  **Screenshots/mockups:**
  This is likely responsible for the reported drop of senses and forms in 
Grafana 
<https://grafana.wikimedia.org/d/000000167/wikidata-datamodel?orgId=1&refresh=30m&from=now-2y&to=now&viewPanel=3>:
  F41425857: image.png <https://phabricator.wikimedia.org/F41425857>
  F41425859: image.png <https://phabricator.wikimedia.org/F41425859>
  
  **BDD**
  GIVEN a new 
  AND 
  WHEN a lexeme is created or edited
  AND its page props have been written
  THEN the `pp_sortkey` for `pp_propname='wbl-senses'` is non-null
  AND the `pp_sortkey` for `pp_propname='wbl-forms'` is non-null
  
  **Acceptance criteria:**
  
  - The `pp_sortkey` is populated for new edits
  - We repopulate the `pp_sortkey` for all affected lexemes
  
  **Open questions:**

TASK DETAIL
  https://phabricator.wikimedia.org/T350224

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lucas_Werkmeister_WMDE, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to