AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION Purpose ------- In T362849: [Analytics] Segments of Wikidata's data over time <https://phabricator.wikimedia.org/T362849> we need to calculate historical segments of Wikidata's items based on their relation to sitelinks. Purpose from that ticket: > As Wikidata Product Managers, we would like to understand how different segments of Wikidata's data developed over time, so we can inform our projections. This task would encompass the historical data that's needed to achieve this. Scope ----- From T362849 <https://phabricator.wikimedia.org/T362849>: > How did the number of Items of the following types develop over time? > > A) Items that contain a sitelink to one of the Wikimedia projects (e.g. about a notable person) > B) Items that are needed to build A (used in A Items for example in a statement or reference; e.g. the non-notable father of that notable person) > C) All other Items - In order to do this, T363451: Add job to create Wikidata partition to wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T363451> was made to recreate the Wikidata partition of wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> - Once this task is complete, work can then begin to use this partition to generate all data from when Wikidata was created to the most recent weekly data generated by the DAG created in T362849 <https://phabricator.wikimedia.org/T362849> Desired Output -------------- - Weekly stats of the number of Items in category A, B and C Acceptance criteria: [ ] Weekly historical breakdowns of populations A, B and C - These would be in the Data Lake and the published datasets --- **Information below this point is filled out by the Wikidata Analytics team.** General Planning ---------------- Information is filled out by the analytics product manager. Assignee Planning ----------------- Information is filled out by the assignee of this task. Estimation ---------- Estimate: Actual: Sub Tasks --------- Full breakdown of the steps to complete this task: [ ] Step Data to be used --------------- See Analytics/Data_Lake <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake> for the breakdown of the data lake databases and tables. The following tables will be referenced in this task: - wmf.mediawiki_wikitext_history <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/Mediawiki_wikitext_history> Notes and Questions ------------------- Things that came up during the completion of this task, questions to be answered and follow up tasks: - Note TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org