[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-12 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a project: Epic. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE, me, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-12 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from In progress to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. @Manuel and @Lydia_Pintscher, just shared a folder with the two CSVs on Wolke. Let me know if there's anything else needed, and I will

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @MarcoSwart 👋 Thanks for the communication here :) I guess I'm a bit confused by how the other one would be used. You're roughly talking about: | word_that_is_missing_from_a_wiktionary | number_of_wiktionaries_that_do_have_it | | MOST_MI

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @Manuel, my assumption was that you could help any non-analytics PMs or go through the results with them as you have the needed access. Using Google for PII is not something we're supposed to do if it can be avoided, but I have no experience with

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-11 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Talked further with WMF about this just now. One basic question for the end users: would it make it more convenient for you all if the exported datasets were per Wiktionary? There are two options here, with missing entries being used as an example: 1

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I can also prepare a notebook with quick functions to load and explore the data, if that would make the option I suggested a bit easier. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. > Would it be possible to send us a spreadsheet (and schedule it for deletion after 90 days)? I'd prefer to transfer via the servers if possible given the comment here <https://phabricator.wikimedia.org/T358311#9820450> from WMF Engineer

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Base queries for all of this are ready :) Let me know on the above and I'll finalize them. Re how to send the files: my suggestion would be that I put them into my `stat1010` and then @Manuel can migrate them to his. From there I'll delete my

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Checking on the numbers here really quick: the request is for the top `1000` user agents and then a sample of `1000` user agents, but the total is `1221`. Would an ordered list of all of them make more sense as we're talking a sample of 82%? There r

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Status is open as T364045 <https://phabricator.wikimedia.org/T364045> has been resolved :) TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavi

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-10 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benja

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Unstalled as the plan for the data export has been approved in T365699 <https://phabricator.wikimedia.org/T365699> :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, D

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Unstalled as the table has been created :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel

[Wikidata-bugs] [Maniphest] T343019: [EPIC] Segments of Wikidata's data over time [up to milestone 3]

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the status of subtask T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T343019 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Items that contain a sitelink to one of the Wikimedia projects over time

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @MarcoSwart, sorry for changing the status without explanation. Was in a meeting and we were moving things around, but obviously context should have been added. This is stalled for now as we're waiting for WMF to advise us on the best way forwa

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note, work that will unblock this task is being done in T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history <https://phabricator.wikimedia.org/T364045>. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFEREN

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Lydia_Pintscher, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T366621: [Analytics] Analysis of REST API user agents for May 2024

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Quick note on this, in discussion, something to check as well would be those user agents that were present in May 2024, but were not active in April 2024 :) TASK DETAIL https://phabricator.wikimedia.org/T366621 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T332899: [EPIC] Migrate selected R-based Wikidata products

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the status of subtask T360296: [Analytics] Implement data process to identify missing Wiktionary entries from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T332899 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-06 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pampu

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. There's now a draft for the DAGs <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/725/diffs#96f15bf21ce9c18b6638c53402e35a2654aeeff6> open on GitLab. There's still lots to do as WMF wants to sync on suggestio

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-06-04 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks so much for the support here, @BTullis! I'll update the epic <https://phabricator.wikimedia.org/T356618> with this being done. So close to being finished with all this :) TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. wmde/analytics/hql/airflow_jobs/wiktionary_cognate <https://gitlab.wikimedia.org/repos/wmde/analytics/-/tree/main/hql/airflow_jobs/wiktionary_cognate?ref_type=heads> on GitLab now has all the needed queries to for missing entries, most popular entri

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Table has been updated with the new data from the most recent DAG run. Lots more user agents - almost a 3x increase. Noting this for now as maybe grounds for further investigation later, but IPs are also increasing (just not by as much). Note that we

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: July 2024)

2024-06-03 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: June 2024)" to "[Analytics] Monthly repeating tasks (next: July 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg, karapayneWMDE

[Wikidata-bugs] [Maniphest] T351070: [EPIC] Clean up Wikidata Grafana cronjobs

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T351070 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: AndrewTavis_WMDE, Micha

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed subtask T351072: Remove the WDCM clone (stats1007) as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Lucas_Werkmeister_WMD

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Perfect, @Lucas_Werkmeister_WMDE! Glad to have this all cleared up :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Arian_Bozorg

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE closed this task as "Resolved". AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE added a comment. Sounds good to me! :) Thanks for the help here, @Lucas_Werkmeister_WMDE and @BTullis! TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T321666: Wiktionary Cognate Dashboard is not accessible [timeboxed 0.5 days]

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Bicolino34 👋 Thanks for reaching out :) We are still working on tasks related to this dashboard - at least bringing back some of the data processes. TASK DETAIL https://phabricator.wikimedia.org/T321666 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Moving this to verification given the work in T364965 <https://phabricator.wikimedia.org/T364965>. Thanks for all of this, @Lucas_Werkmeister_WMDE! Maybe we can resolve this and leave T364965 <https://phabricator.wikimedia.org/T364965> until `

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-29 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. None of the files listed in your comment above <https://phabricator.wikimedia.org/T364965#9838579> look like things we should worry about, @Lucas_Werkmeister_WMDE. Similarly that there's a different commit for this, as to my knowledge `stat10

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-05-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I've been asking around about the data source and connecting the tables and have yet to get concrete answers. Based on general assumptions of the names of the tables/columns though, the path forward for getting missing entries for a Wiktionary will

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-05-28 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WM

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, D

[Wikidata-bugs] [Maniphest] T360296: [Analytics] Implement data process to identify missing Wiktionary entries

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360296 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, ECohen_WMDE, Aklapper, Pamputt, AndrewTavis_WMDE, JeanFred

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thanks for taking care of this, @Lucas_Werkmeister_WMDE! We'll be able to close both this and T351072 <https://phabricator.wikimedia.org/T351072> after Tuesday next week if/when the Puppet change is deployed :) TASK DET

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. @BTullis, checking in on this as your help in T358311 <https://phabricator.wikimedia.org/T358311> reminded me as it's all related to the same user. Would you be able to remove the `statistics/manifests/wmde/wdcm.pp` file and any related processes

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Thank you, @BTullis! Ya I wasn't happy with the solution either. Appreciate your willingness to help! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. I'm realizing also that I don't have admin rights and thus can't move files to your directory. I'll copy these files over to my directory, download them and send you a link to a zipped directory on Google Drive once we have the abov

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel, checking further as it's still not clear what you'd like. The double except is confusing. I'll only transfer files from `stat1005`, and could you answer the following questions: 1. Do you want **data files** (.csv, .tsv, etc)

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hi @Manuel - sending along a summary of what I'll be getting for you: == stat1004 == Jul 25 2020 Analytics Jun 23 2020 Experiments Jul 25 2020 wdUsagePerPage == stat1005 == All non data

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok then! So the checks of the files above is complete as shown by its status. General summaries of each stat machine and HDFS are provided under the subsections above. `stat1005` has some files that @Manuel may find interesting given that they'r

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. So basically removing the wdcm.pp related file on GitHub and its Puppet workflows will close both tasks :) TASK DETAIL https://phabricator.wikimedia.org/T351072 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ah looking at this, I'm realizing I restated myself as the work that's left in T364965: stat1007 to stat1011 migration pipeline output check <https://phabricator.wikimedia.org/T364965> is a duplicate of what we want to do here :) TAS

[Wikidata-bugs] [Maniphest] T351072: Remove the WDCM clone (stats1007)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @Arian_Bozorg 👋 Yes, we do still need to check this out. I was thinking that @Lucas_Werkmeister_WMDE and I could discuss this when we chat about what else is needed in T364965: stat1007 to stat1011 migration pipeline output check <ht

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365457 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Manuel, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T365457: Bring in all Purdue Porgram PRs and upload Mismatch Finder mismatches

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Making this task as a means of saving that there is still work to be done to close out the Purdue Data Mine program

[Wikidata-bugs] [Maniphest] T356618: [EPIC] Check of legacy wmde analytics infrastructure

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356618 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Michael, karapayneWMDE, Aklapper, Manuel, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-21 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. ⚠️ Currently WIP ⚠️ === Going through the files sent by @JAllemandou above <https://phabricator.wikimedia.org/T358311#9648470>. This message will be saved as I go so that I don't loose my progress 😊 If I do find some

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-17 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Note that MR#700 <https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/700> has been opened that has the work for this :) TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm (timeboxed 0,5 days)

2024-05-16 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: brouberol, JAllemandou, MoritzMuehlenhoff, Manuel, Aklapper, AndrewTavis_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Confirming that data's still coming in as well. @BTullis, what should we do about statistics/manifests/wmde/wdcm.pp <https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/wmde/wdcm.pp>? Remove the file? An

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Quick note that the word used by @BTullis was `disabled` instead of `removed` for the stat1007 timers, so apologies if this caused some confusion. I figure not, but just wanted to be clear :) @BTullis, would you be able to check the journal for them and

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE changed the task status from "Open" to "Stalled". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benja

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T364965: stat1007 to stat1011 migration pipeline output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "stat1007 migration output check" to "stat1007 to stat1011 migration pipeline output check". TASK DETAIL https://phabricator.wikimedia.org/T364965 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/e

[Wikidata-bugs] [Maniphest] T364965: stat1007 migration output check

2024-05-15 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata Analytics (Kanban), Wikidata, Wikidata Dev Team. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Context --- Recently WMF has been migrating from legacy stat servers that are being

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Sheet updated with the numbers for April. Higher number of user agents, but lower IPs (but then IPs still much higher than Feb). TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T342559: [Analytics] Monthly repeating tasks (next: June 2024)

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "[Analytics] Monthly repeating tasks (next: May 2024)" to "[Analytics] Monthly repeating tasks (next: June 2024)". AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T342559 EMAIL

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T361203 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Manuel, Aklapper, AndrewTavis_WMDE, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T358311: Check home/HDFS leftovers of goransm

2024-05-14 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Hey @brouberol 👋 Just getting back from two weeks off today :) I'll check into this and get back to you all! Thanks for the ping! TASK DETAIL https://phabricator.wikimedia.org/T358311 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelink segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "Generate historical weekly segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelink segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL

[Wikidata-bugs] [Maniphest] T363583: Generate historical weekly segments of Wikidata item sitelinks segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE renamed this task from "Generate weekly historical segments of Wikidata item sitelinks segmentations" to "Generate historical weekly segments of Wikidata item sitelinks segmentations". TASK DETAIL https://phabricator.wikimedia.org/T363583 EMAIL

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T363583: Generate weekly historical segments of Wikidata item sitelinks segmentations

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE created this task. AndrewTavis_WMDE added projects: Wikidata, Wikidata Analytics (Kanban). Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Purpose --- In T362849: [Analytics] Segments of Wikidata's data over time &

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See T362849_wd_item_sitelink_segments.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/wikidata/2024/T362849_wd_item_sitelink_segments/T362849_wd_item_sitelink_segments.ipynb?ref_type=heads> for the work to derive the se

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: JAllemandou, mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Ok, so the new numbers after the change in scope for the max `2024-04-15` snapshot are: items_with_sitelinks: 32,231,861 items_items_with_sitelinks_link_to: 2,980,388 all_other_items: 72,910,679 For documentation, the numbers for the

[Wikidata-bugs] [Maniphest] T361203: [Analytics] Add the published datasets directories as a target for the REST API Airflow jobs

2024-04-26 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Moved this to `In progress` as I'm adding the job to export everything to the published datasets folder to the DAG as I work on the same for T362849 <https://phabricator.wikimedia.org/T362849>. TASK DETAIL https://phabricator.wikimedia.org/T36

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-25 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. See {https://phabricator.wikimedia.org/T363451} for the task about bringing back the partition (hopefully via another job). I added a bit about whether we want to maybe turn this job on when WMDE needs historical data. Let me know what you all think on that

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Another note on this is: if we don't expect to be needing a Wikidata partition of `wmf.mediawiki_wikitext_history` for other tasks, then we could work directly from the XML dump for the data backdate. We wouldn't be able to leverage PySpark for th

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a subscriber: JAllemandou. AndrewTavis_WMDE added a comment. Thanks for all of the information, @mpopov! I talked this over in my bi-weekly with @JAllemandou, and would like to bring some further context to this particular situation :) The go to table for this

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T362849: [Analytics] Segments of Wikidata's data over time

2024-04-23 Thread AndrewTavis_WMDE
AndrewTavis_WMDE claimed this task. AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362849 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: mpopov, AndrewTavis_WMDE, Manuel, Aklapper

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Summary on your end sounds great, @Ifrahkhanyaree_WMDE! 😊 Let me know if sending along some empty new item revisions from 2024 would be helpful :) TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE added a comment. Notebook with the work that was done for this is: wmde/analytics/tasks/product_platform/2024/T360761_empty_wikidata_items/T360761_empty_wikidata_items.ipynb <https://gitlab.wikimedia.org/repos/wmde/analytics/-/blob/main/tasks/product_platform/2

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T360761 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE Cc: Aklapper, Ifrahkhanyaree_WMDE, Manuel, Danny_Benjafield_WMDE, S8321414

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-22 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from Needs product input to Product verification on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. Further insights on this, and moving it to `Product verification` at this point :) I've now changed the query to a span of bytes

[Wikidata-bugs] [Maniphest] T360761: [Analytics] Analysis of empty new Wikidata Items

2024-04-19 Thread AndrewTavis_WMDE
AndrewTavis_WMDE moved this task from In progress to Needs product input on the Wikidata Analytics (Kanban) board. AndrewTavis_WMDE added a comment. The thread on Mattermost <https://mattermost.wikimedia.de/swe/pl/gsr9b485x7geby79t4sg151j7c> for discussing this has a lot of comments

  1   2   3   4   5   6   7   >