[Wikidata-bugs] [Maniphest] T283575: Wikidata Analytics: codebase modularization

2021-08-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Deploying {WMDEData} with `renv::install()` across the WMF Analytics Clients (stat1004, stat1005, stat1006, stat1007, stat1008). TASK DETAIL https://phabricator.wikimedia.org/T283575 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T283568: [Epic] Wikidata Analytics Core Codebase Maintenance

2021-08-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. {WMDEData} <https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/tree/master/_lib/WMDEData> is finally submitted. - tests on stats1005 Analytics Client: DONE. - forthcoming changes in the codebase in relation to: - T283570

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-20 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel Here is a refinment of T288611#7293369 <https://phabricator.wikimedia.org/T288611#7293369>: **Sitelinks Statistics** 1. In **whole Wikidata**, we currently find `26,368,626` items (out of `91,437,737` items with `P31 insta

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-08-20 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T284826 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: kai.nissen, max_klemm, Manuel, Merle_von_Wittich_WMDE, To

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel The datasets described in T288611#7283258 <https://phabricator.wikimedia.org/T288611#7283258> are now updated with correct data and found in this public directory <https://analytics.wikimedia.org/published/datasets/wmde-analytics-en

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel 1. In **whole Wikidata**, we currently find `78,505,497` (out of `94,158,141`) items with at least one External Id: that would be about 83% of all Wikidata items, implying **17%** of items w/o External Ids**. 2. In **Astronomical Objects

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel **IMPORTANT.** Probably all numbers - except those reported for whole Wikidata - will have to be corrected here. I have been using WDQS to obtain the instances of all sub-classes of Astronomical Objects and Scholarly Articles until now

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel From our 1:1 > Number and % of items in WD with (no) external identifier [split by core, astronomical, citation] - ETL phase completed, datasets obtained; - re-composition in R, in RAM analysis now. TASK DETAIL ht

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-19 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel Here are a few more things, general statistics on whole Wikidata, to consider: - we consider `590,404` classes in total; - `307,646` classes (52%) do not have a single item with a sitelink; - here are (a) a chart with the top 50 classes

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel > Do we know why there are so many astronomical objects with sitelinks? (e.g. what projects do they predominantly connect to?) The following table should be able to help answer your question. F34601012: astrFrame.csv <

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel From our 1:1 TUE 17. August 2021: > Number and % of items in WD with (no) sitelinks [split by core, astronomical, citation] **"Core" Wikidata (i.e. Wikidata - (Astronomical Objects + Scholarly Articles))** - numb

[Wikidata-bugs] [Maniphest] T283570: Impose the tidyverse style for R code wherever possible

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - tidyverse style **almost perfectly** applied across the Wikidata Languages Landscape code. TASK DETAIL https://phabricator.wikimedia.org/T283570 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T283570: Impose the tidyverse style for R code wherever possible

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Production: Wikidata Languages Landscape: - namespaces implemented across the codebase. TASK DETAIL https://phabricator.wikimedia.org/T283570 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic

[Wikidata-bugs] [Maniphest] T283571: Automation of large Wikidata Analytics updates

2021-08-18 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Wikidata Languages Landscape is now - refactored into 5 modules + orchestra script, similar to WDCM: repo <https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/tree/master/_engines/_wdLanguagesLandscape> - and has a WDCM-like im

[Wikidata-bugs] [Maniphest] T286257: Insights from the Wikidata Languages Landscape project

2021-08-17 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Esc3300 > It could be interesting to check labels that are unique to a language. Please check-out our Wikidata Languages Landscape <https://wikidata-analytics.wmcloud.org/app/WD_LanguagesLandscape> system and let me know if it pro

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel - The data are published here <https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/wd_classes_sitelinks/> (tar.gz -> .csv files) - better than in Google Drive; - **

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-08-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE @Tobi_WMDE_SW Do we need anything additional here? TASK DETAIL https://phabricator.wikimedia.org/T284826 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: kai.nissen

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel The general case (whole Wikidata) is solved, result: - a table - rows: Wikidata classes - columns: Wikimedia projects - cells: number of items in a particular class w. sitelinks towards a particular project - additional columns

[Wikidata-bugs] [Maniphest] T288611: Number of links to other Wikimedia projects

2021-08-11 Thread GoranSMilovanovic
GoranSMilovanovic updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T288611 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

2021-08-10 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel - A new dataset is produced, encompassing the following fields: - **class**: a Wikidata class - **num_items**: number of items in the class (via instanceOf, subclassOf, or partOf) - **avg_score**: the average ORES score in this class (A

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-08-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Here's the ETL code <https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetention>. I will add modeling and power law estimation as soon as I complete all additional steps as suggested. TASK DET

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-08-03 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @MGerlach First of all, thank you very much for the insights that you have provided. **On Power Laws and Lindy:** > One possible path out of this is to slightly change the question. Instead of asking whether the data is perfectly described

[Wikidata-bugs] [Maniphest] T287667: Provide Wikidata user behavior data to categorize users by hand

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel The `tagsFrame_Sample_ANON.csv` dataset is now shared via Google Drive: - anonymized user names match the `fullRevisionFrame_ANON.csv` dataset; - the only change in respect to T287667#7251793 <https://phabricator.wikimedia.org/T287

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @awight First of all, I might have missed to mention that the outcome variable (i.e. what we are predicting) is **"stay"**, not "leave". My bad. > I'm unsure whether "positive" here means the classifier ident

[Wikidata-bugs] [Maniphest] T287667: Provide Wikidata user behavior data to categorize users by hand

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel An additional, potentially interesting dataset is: F34573678: revisionTagFrequency.csv <https://phabricator.wikimedia.org/F34573678> It lists the revision tags used in June 2021 + their respective usage frequencies. TASK DETAIL

[Wikidata-bugs] [Maniphest] T287667: Provide Wikidata user behavior data to categorize users by hand

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel - all data are based on the `2021-06` (latest available) snapshot of the wmf.mediawiki_history table in the WMF Data Lake <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history>; - all data are derived from Wi

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich @awight @Lydia_Pintscher @Manuel @Tobi_WMDE_SW Probably of interest to all of you, because we have a quite interesting - and potentially very useful - outcome here. As a side kick to this ticket, I have trained a Random Forest

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-08-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE All campaign user registrations, independently of the page visited prior to registration, are tracked by our analytics code. In effect, the user registrations that we find in the campaign public data directory are absolutely

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich **Do we really find a Lindy effect in the Wikidata acount age distribution?** **Assumption.** As demonstrated in Eliazar, Iddo (November 2017). "Lindy's Law". Physica A: Statistical Mechanics and Its Applications. 486: 7

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-07-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich @awight Finally, as of > ... user behavior on talk pages F34570923: 07_RevisionTalkNamespacesVSLeftWikidata.png <https://phabricator.wikimedia.org/F34570923> but please take into your considerations that the dist

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE Test commenced approx. 20:30 CET today, and this is all we have: year month day hourcampaign userid 1: 2021 7 307 WMDE_2021_wikipost_1_11 3771433 username 1: Test 2

[Wikidata-bugs] [Maniphest] T286257: Insights from the Wikidata Languages Landscape project

2021-07-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel We did not touch upon this one in our 1:1. Do we need anything else here? Please let me know. Thanks! TASK DETAIL https://phabricator.wikimedia.org/T286257 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-07-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich @awight In reference to T282563#7186386 <https://phabricator.wikimedia.org/T282563#7186386> and T282563#7226336 <https://phabricator.wikimedia.org/T282563#7226336>: - I have used a fresh dataset, relying on the `2021-06`

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-07-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Re-work on a fresh dataset (the `2021-06` snapshot of the `wmf.mediawiki_history` table) is underway; - Reporting: until tonight (hopefully); - @Jan_Dittrich I will be getting in touch via e-mail about the research/paper part later during the day

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE Finally, another search for campaign registered users in reference to T284826#7245950 <https://phabricator.wikimedia.org/T284826#7245950> and T284826#7245960 <https://phabricator.wikimedia.org/T284826#7245960>: nothing

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE In the meantime: there are no campaign registrations containing anything similar to `WMDE_2021_wikipost`. The following query - pretty much standard in all our recent campaigns - returns an empty result set: SELECT year

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > I've just created the following account Test Hanna Klein (WMDE) and I've already edited on Wikipedia with that account. It will take some time before the databases register that. Please, do the following in

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE - Nothing found while additional controlling for URL encoding of special characters; - one final stretch: looking for anything under the `uri_path`: `/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia` that has any of the

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE Nothing found `RLIKE https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia#Schritt1` **at all** since the beginning of the campaign. - Now running one additional, final check; then - re-running full data

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > ... I think that some page views should appear in your analysis. What might be the reason for this deviation? - running a "soft" approach (i.e. regex w. RLIKE `Schritt1`) now - reporting back as soon as I have the

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE I have re-run our data acquisition procedures looking exactly for the following URLs (without the campaign tags attached, of course): https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia#Schritt1_

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

2021-07-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Next step: ORES per class in Human vs Bots Statistics. TASK DETAIL https://phabricator.wikimedia.org/T285458 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Ladsgroup, Lydia_Pintscher

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-27 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > ... have those really not been clicked at all or might there be another reason for not appearing? I am on it now. TASK DETAIL https://phabricator.wikimedia.org/T284826 EMAIL PREFERENCES https://phabricator.wikimedia.

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-22 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: max_klemm. GoranSMilovanovic added a comment. @max_klemm @Tobi_WMDE_SW @Hanna_Klein_WMDE @Merle_von_Wittich_WMDE @Manuel Following an e-mail exchange with Max, the following links are now added to the campaign tracking/analytics script: For

[Wikidata-bugs] [Maniphest] T286257: Insights from the Wikidata Languages Landscape project

2021-07-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel Here is a concise report that relies on UNESCO Language Status <http://www.unesco.org/languages-atlas/>: F34547890: Wikidata_LanguageStatusReport.nb.html <https://phabricator.wikimedia.org/F34547890> The analyses presented

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel For our 1:1 this morning, an updated report, and as discussed in our previous 1:1: - section 2.5 ORES quality in Human (Q5), - section 2.7 The distribution of ORES scores in the remaining Wikidata classes (Wikidata - (Astronomical Object

[Wikidata-bugs] [Maniphest] T286277: [Curious Facts] Provide a mode with equiprobability among constraints

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. An alternative option is to - have the dashboard re-organized so - that controls would be included for uses - to select among different types of anomalies, suitably described in non-technical terms. Also, I was think about implementing a search

[Wikidata-bugs] [Maniphest] T286277: [Curious Facts] Provide a mode with equiprobability among constraints

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added a project: User-GoranSMilovanovic. GoranSMilovanovic added subscribers: Manuel, Tobi_WMDE_SW. TASK DETAIL https://phabricator.wikimedia.org/T286277 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T286277: [Curious Facts] Provide a mode with equiprobability among constraints

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc @Lydia_Pintscher The current sampling of anomalies that would be presented to a user of Curious Facts is random and proportional to the size of the respective anomaly set. I have also read that comment on the project's talk page but

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". GoranSMilovanovic added a comment. @amy_rc Ok. Closing the ticket as resolved. TASK DETAIL https://phabricator.wikimedia.org/T277564 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefer

[Wikidata-bugs] [Maniphest] T261906: Qurator: The Wikidata Curious Facts Project

2021-07-07 Thread GoranSMilovanovic
GoranSMilovanovic closed subtask T277564: [Curious Facts] take separators into account for single value constraints as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T261906 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefer

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc Fixed; please take a look at Qurator Curious Facts <https://wikidata-analytics.wmcloud.org/app/WikidataAnalytics>. TASK DETAIL https://phabricator.wikimedia.org/T277564 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T286242: [Curious Facts] link to displayed image (if any)

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. GoranSMilovanovic added a project: User-GoranSMilovanovic. TASK DETAIL https://phabricator.wikimedia.org/T286242 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: GoranSMilovanovic

[Wikidata-bugs] [Maniphest] T286242: [Curious Facts] link to displayed image (if any)

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added subscribers: LucasWerkmeister, Manuel, Tobi_WMDE_SW, GoranSMilovanovic. GoranSMilovanovic added a comment. @LucasWerkmeister Thanks for catching this. I will take a look at the API that the Curious Facts project currently uses to fetch Wikimedia Commons images

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. **Next steps**: - interactive Pyspark (Jypiter/Analytics Cluster) approach: - generate M3 (single value constraint violations) solutions from the hdfs dump; - write out a direct test against WDQS; - sample the "suspects" - it

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc I see. I have also tested myself and found more similar cases. @Manuel @Tobi_WMDE_SW Upon numerous attempts to solve this problem now I need to declare that all general approaches have failed. This must be, I believe, a consequence of

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added a subscriber: Manuel. GoranSMilovanovic added a comment. @Hanna_Klein_WMDE @Merle_von_Wittich_WMDE @Manuel @Tobi_WMDE_SW The campaign analytics code is now running on an automatic daily schedule from stat1007's crontab. Please let me know until when

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

2021-07-06 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel Here is my current take on > ideas about possible next steps (towards a better understanding of the current distribution of the ORES quality scores across Wikidata’s classes) - Gather potential explanatory variables and mo

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-07-05 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc @Lydia_Pintscher Could some please take a look at this ticket and let me know if we can finally resolve it? Thank you! It's here: Qurator Curious Facts <https://wikidata-analytics.wmcloud.org/app/CuriousFacts> : ) TASK DET

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

2021-07-05 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel Please take a look at the following report if you find some time before our 1:1 at 14:30 CET today: F34540210: Wikidata_ORES_Class_Distributions.nb.html <https://phabricator.wikimedia.org/F34540210> I will give you a walk-throug

[Wikidata-bugs] [Maniphest] T261906: Qurator: The Wikidata Curious Facts Project

2021-07-05 Thread GoranSMilovanovic
GoranSMilovanovic closed subtask T285752: [Curious Facts] A fact with "single value constraint" as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T261906 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMil

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE @Merle_von_Wittich_WMDE - Analytics code is updated to encompass the recently added pages and tags - Analytics will be updated daily in the public data directory <https://analytics.wikimedia.org/published/datasets/wmde-analyt

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE I have responded in the tracking doc too: just leave it as it is, and thank you very much! TASK DETAIL https://phabricator.wikimedia.org/T284826 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > Ok :) I will add them all now! Will let you know as soon as it's finished! Great, because it would be best to have all the tags in place now. Please let me know when the table is complete. I will then update the analyt

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE I almost forgot: > I will prepare 3 further newsletters from July 21st and will add more tags to the tracking doc. Please place the new pages to track from July 21 in a **separate table** in the tracking document <

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > Could you manage on 23rd of July f.ex.? As mentioned in our 1:1 already I will be available as of July 25. If you add the new pages/tags to the tracking document before July 15 I will be able to setup the analytics code bef

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE I have just realized that even if the double quotation marks are removed from the URLs some tags are still observed as having a quotation mark attached. For whatever reason this happens, I will simply clean that up in the analytics

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > i don't understand: is there a problem? The problem - which is really not a problem - is that `?WMDE_2021_wikipost_3_1` and `WMDE_2021_wikipost_3_1` would be treated as two different tags in the analytics code. I

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE I found the following "10","?WMDE_2021_wikipost_3_1","/wiki/Wikipedia:WikiProjekt_Frauen/Frauen_in_Rot",3,2021-07-01,"2021_WMDE_Newsletter" "11","?WMDE_2021_wikipost

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE Is it possible to have all the pages that we need to track in this campaign - irrespective of the starting date - listed in the tracking document <https://docs.google.com/document/d/1xziEs3HyR48a_BRzCnSnytuuu6nbrr479WQMCsCofSk/edit

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-02 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE > I've also noticed the ?? - does it matter? Not really, I was just wondering if there is a reason to it. > At the moment I am preapring new tags for the 2nd and 3rd newsletter campaigns, who will be sent on 7

[Wikidata-bugs] [Maniphest] T284826: Tracking of on wiki registrations and edits during newsletter campaign

2021-07-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Hanna_Klein_WMDE - Analytics code tested; - Regular daily updates are now scheduled; - The data will be published in https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/NewEditors/campaigns/2021_WMDE_Newsletter_Campaign

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for joint sensemaking session about ORES quality score distributions across the Wikidata classes

2021-06-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel As we agreed in our 1:1 today: - prioritizing Exploratory Data Analysis/Hypothesis Generation over clustering; - let's first see what insights can we have before making any modeling assumptions. TASK DETAIL

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for joint sensemaking session about ORES quality score distributions across the Wikidata classes

2021-06-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @manuel (1) classes x scores (wide data representation) →**we have this data representation now** (2) Let's make a choice of a clustering algorithm, candidate no.1: K-means in Apache Spark's MLlib. TASK DETAIL https://phabricator.wik

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for joint sensemaking session about ORES quality score distributions across the Wikidata classes

2021-06-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Manuel > Maybe let's quickly talk about this in our 1:1? Of course. > What would you cluster by? Well, I guess in the beginning it would only be a matrix of (1) Wikidata classes x (2) the counts of ORES A, B, C, D, E score

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-06-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich Following our 20210630 discussion: **Additional questions** - for those ~ 6% who are still with us: can we find any interesting patterns - the distribution of the length of their periods of inactivity - the distribution of

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for joint sensemaking session about ORES quality score distributions across the Wikidata classes

2021-06-30 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Update 20210630 - join items x scores x classes: **done** - all items with missing ORES predictions were filtered out; - all duplicated set theoretic/mereological relations were singled out (e.g. if an item refers to a class via both `P31` and

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-29 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc The issue should be resolved now, please see Qurator Curious Facts <https://wikidata-analytics.wmcloud.org/app/CuriousFacts>. TASK DETAIL https://phabricator.wikimedia.org/T277564 EMAIL PREFERENCES https://phabricator.wikimedia.org/se

[Wikidata-bugs] [Maniphest] T221103: Have specific URLs for different parts of the dashboard

2021-06-25 Thread GoranSMilovanovic
GoranSMilovanovic removed GoranSMilovanovic as the assignee of this task. GoranSMilovanovic added a comment. Re-assigning. TASK DETAIL https://phabricator.wikimedia.org/T221103 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc

[Wikidata-bugs] [Maniphest] T221103: Have specific URLs for different parts of the dashboard

2021-06-25 Thread GoranSMilovanovic
GoranSMilovanovic claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T221103 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Manuel, Pigsonthewing, VIGNERON, Lydia_Pintscher, GoranSMilovanovic, Lea_Lacroix_WMDE

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc I think I've found the cause of things like T277564#7157895 <https://phabricator.wikimedia.org/T277564#7157895>. It definitely has to do with the following observation of yours: > ... we observed that the tool only considers val

[Wikidata-bugs] [Maniphest] T283571: Automation of large Wikidata Analytics updates

2021-06-20 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Wikidata Languages Landscape automation from stat1008 Analytcs Client: done. TASK DETAIL https://phabricator.wikimedia.org/T283571 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Aklapper

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-19 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T284850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Manuel, RhinosF1, GoranSMilovanovic, Tobi_WMDE_SW, Lydia

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc @Lydia_Pintscher Could it be the case that mapping relation type <https://www.wikidata.org/wiki/Property:P4390> is treated a separator - which overrides the single value constraint - and the Curious Facts system then recognizes that on

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc The part unclear to me is the following one: > ... we observed that the tool only considers values containing qualifiers. From the docs <https://www.wikidata.org/wiki/Help:Property_constraints_portal/Single_value>: > A qual

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc Could you please clarify T277564#7158984 <https://phabricator.wikimedia.org/T277564#7158984>? Thank you. TASK DETAIL https://phabricator.wikimedia.org/T277564 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailprefe

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-16 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @MisterSynergy Could you please check the wdcm_topItems.csv <https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/etl/wdcm_topItems.csv> dataset now and let me know if it looks alright? TASK DETAIL

[Wikidata-bugs] [Maniphest] T277551: [Curious Facts] improvements to issue descriptions

2021-06-15 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T277551 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Tobi_WMDE_SW, Scott_WUaS, WMDE-leszek, amy_rc, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] T282563: User Retention Wikidata: A model for "participating since" patterns in the 2021 Wikidata Community Survey

2021-06-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Jan_Dittrich **Please disregard all previous findings**. The following is based on: - the definition of editor inactivity in T282563#7124389 <https://phabricator.wikimedia.org/T282563#7124389>, - and the two important corrections

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-14 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @MisterSynergy - full manual update of the WDCM Sqoop procedure is now completed; - 876 partitions (`wiki_db`) are present in the Data Lake, which means that everything should be fine, - except for if something changed in the per wiki

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - Sqoop Shard 4 running now (Commons): in comparison to what was observed from the WDCM Sqoop Clients Log in T284850#7152935 <https://phabricator.wikimedia.org/T284850#7152935>, I see no problem in relation to Commons anymore: the Commons database is

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Monitoring Sqoop procedures from Core MediaWiki databases <https://wikitech.wikimedia.org/wiki/MariaDB#Core_MediaWiki_databases> to `goransm.wdcm_clients_wb_entity_usage` in the DataLake: - Shard 1 (enwiki only): completed; - Shard 2: still r

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-13 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @MisterSynergy - running a manual update of the WDCM sqoop module now; - monitoring. TASK DETAIL https://phabricator.wikimedia.org/T284850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc > However, I ran into a situation where the data was being retrieved incorrectly. This has happened couple of times. For instance: Qurious Facts: Silver-Russell syndrome (Q2142496) has 2 values for property: OMIM ID (P492 <

[Wikidata-bugs] [Maniphest] T277551: [Curious Facts] improvements to issue descriptions

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc Issue descriptions modified to look exactly as suggested in T277551#7137941 <https://phabricator.wikimedia.org/T277551#7137941>, please test: https://wikidata-analytics.wmcloud.org/app/CuriousFacts TASK DETAIL

[Wikidata-bugs] [Maniphest] T277551: [Curious Facts] improvements to issue descriptions

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc - Full system update completed; - Issue descriptions are now fixed (local tests completed); - deploying soon; it will be ready for tests in an hour or so. TASK DETAIL https://phabricator.wikimedia.org/T277551 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @MisterSynergy Thank you. No worries, I will figure this out from the WDCM sqoop logs. Sooner or later. TASK DETAIL https://phabricator.wikimedia.org/T284850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. - On the first sight, there were only 687 projects whose reuse data were sqooped by the WDCM_Sqoop_Clients.R <https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/blob/master/_engines/_wdcmModules/WDCM_Sqoop_Clients.R> run, and - that

[Wikidata-bugs] [Maniphest] T284850: Wikidata Concepts Monitor: usage numbers have shrunk considerably within a week

2021-06-12 Thread GoranSMilovanovic
GoranSMilovanovic added subscribers: Lydia_Pintscher, Tobi_WMDE_SW, GoranSMilovanovic. GoranSMilovanovic claimed this task. GoranSMilovanovic added a project: User-GoranSMilovanovic. GoranSMilovanovic triaged this task as "High" priority. GoranSMilovanovic added a comment. @Mis

[Wikidata-bugs] [Maniphest] T277564: [Curious Facts] take separators into account for single value constraints

2021-06-10 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @amy_rc That is rather strange. I am running a full system update now in relation to T277551 <https://phabricator.wikimedia.org/T277551>; let's wait for the new update and then check out if the problem persists. I will perform the tests and let

<    1   2   3   4   5   6   7   8   9   >