GoranSMilovanovic added a comment.
- Deploying {WMDEData} with `renv::install()` across the WMF Analytics
Clients (stat1004, stat1005, stat1006, stat1007, stat1008).
TASK DETAIL
https://phabricator.wikimedia.org/T283575
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
GoranSMilovanovic added a comment.
{WMDEData}
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/tree/master/_lib/WMDEData>
is finally submitted.
- tests on stats1005 Analytics Client: DONE.
- forthcoming changes in the codebase in relation to:
- T283570
GoranSMilovanovic added a comment.
@Manuel
Here is a refinment of T288611#7293369
<https://phabricator.wikimedia.org/T288611#7293369>:
**Sitelinks Statistics**
1. In **whole Wikidata**, we currently find `26,368,626` items (out of
`91,437,737` items with `P31 insta
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T284826
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: kai.nissen, max_klemm, Manuel, Merle_von_Wittich_WMDE, To
GoranSMilovanovic added a comment.
@Manuel
The datasets described in T288611#7283258
<https://phabricator.wikimedia.org/T288611#7283258> are now updated with
correct data and found in this public directory
<https://analytics.wikimedia.org/published/datasets/wmde-analytics-en
GoranSMilovanovic added a comment.
@Manuel
1. In **whole Wikidata**, we currently find `78,505,497` (out of
`94,158,141`) items with at least one External Id: that would be about 83% of
all Wikidata items, implying **17%** of items w/o External Ids**.
2. In **Astronomical Objects
GoranSMilovanovic added a comment.
@Manuel
**IMPORTANT.** Probably all numbers - except those reported for whole
Wikidata - will have to be corrected here.
I have been using WDQS to obtain the instances of all sub-classes of
Astronomical Objects and Scholarly Articles until now
GoranSMilovanovic added a comment.
@Manuel From our 1:1
> Number and % of items in WD with (no) external identifier [split by core,
astronomical, citation]
- ETL phase completed, datasets obtained;
- re-composition in R, in RAM analysis now.
TASK DETAIL
ht
GoranSMilovanovic added a comment.
@Manuel
Here are a few more things, general statistics on whole Wikidata, to consider:
- we consider `590,404` classes in total;
- `307,646` classes (52%) do not have a single item with a sitelink;
- here are (a) a chart with the top 50 classes
GoranSMilovanovic added a comment.
@Manuel
> Do we know why there are so many astronomical objects with sitelinks? (e.g.
what projects do they predominantly connect to?)
The following table should be able to help answer your question.
F34601012: astrFrame.csv <
GoranSMilovanovic added a comment.
@Manuel
From our 1:1 TUE 17. August 2021:
> Number and % of items in WD with (no) sitelinks [split by core,
astronomical, citation]
**"Core" Wikidata (i.e. Wikidata - (Astronomical Objects + Scholarly
Articles))**
- numb
GoranSMilovanovic added a comment.
- tidyverse style **almost perfectly** applied across the Wikidata Languages
Landscape code.
TASK DETAIL
https://phabricator.wikimedia.org/T283570
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a comment.
- Production: Wikidata Languages Landscape:
- namespaces implemented across the codebase.
TASK DETAIL
https://phabricator.wikimedia.org/T283570
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
GoranSMilovanovic added a comment.
Wikidata Languages Landscape is now
- refactored into 5 modules + orchestra script, similar to WDCM: repo
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/tree/master/_engines/_wdLanguagesLandscape>
- and has a WDCM-like im
GoranSMilovanovic added a comment.
@Esc3300
> It could be interesting to check labels that are unique to a language.
Please check-out our Wikidata Languages Landscape
<https://wikidata-analytics.wmcloud.org/app/WD_LanguagesLandscape> system and
let me know if it pro
GoranSMilovanovic added a comment.
@Manuel
- The data are published here
<https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wikidata/wd_classes_sitelinks/>
(tar.gz -> .csv files) - better than in Google Drive;
- **
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE @Tobi_WMDE_SW
Do we need anything additional here?
TASK DETAIL
https://phabricator.wikimedia.org/T284826
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: kai.nissen
GoranSMilovanovic added a comment.
@Manuel
The general case (whole Wikidata) is solved, result:
- a table
- rows: Wikidata classes
- columns: Wikimedia projects
- cells: number of items in a particular class w. sitelinks towards a
particular project
- additional columns
GoranSMilovanovic updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T288611
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, GoranSMilovanovic, Manuel, Aklapper, Invadibot, maantietaja
GoranSMilovanovic added a comment.
@Manuel
- A new dataset is produced, encompassing the following fields:
- **class**: a Wikidata class
- **num_items**: number of items in the class (via instanceOf, subclassOf, or
partOf)
- **avg_score**: the average ORES score in this class (A
GoranSMilovanovic added a comment.
Here's the ETL code
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetention>.
I will add modeling and power law estimation as soon as I complete all
additional steps as suggested.
TASK DET
GoranSMilovanovic added a comment.
@MGerlach
First of all, thank you very much for the insights that you have provided.
**On Power Laws and Lindy:**
> One possible path out of this is to slightly change the question. Instead
of asking whether the data is perfectly described
GoranSMilovanovic added a comment.
@Manuel
The `tagsFrame_Sample_ANON.csv` dataset is now shared via Google Drive:
- anonymized user names match the `fullRevisionFrame_ANON.csv` dataset;
- the only change in respect to T287667#7251793
<https://phabricator.wikimedia.org/T287
GoranSMilovanovic added a comment.
@awight
First of all, I might have missed to mention that the outcome variable (i.e.
what we are predicting) is **"stay"**, not "leave". My bad.
> I'm unsure whether "positive" here means the classifier ident
GoranSMilovanovic added a comment.
@Manuel An additional, potentially interesting dataset is:
F34573678: revisionTagFrequency.csv
<https://phabricator.wikimedia.org/F34573678>
It lists the revision tags used in June 2021 + their respective usage
frequencies.
TASK DETAIL
GoranSMilovanovic added a comment.
@Manuel
- all data are based on the `2021-06` (latest available) snapshot of the
wmf.mediawiki_history table in the WMF Data Lake
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history>;
- all data are derived from Wi
GoranSMilovanovic added a comment.
@Jan_Dittrich @awight @Lydia_Pintscher @Manuel @Tobi_WMDE_SW
Probably of interest to all of you, because we have a quite interesting - and
potentially very useful - outcome here.
As a side kick to this ticket, I have trained a Random Forest
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
All campaign user registrations, independently of the page visited prior to
registration, are tracked by our analytics code. In effect, the user
registrations that we find in the campaign public data directory are absolutely
GoranSMilovanovic added a comment.
@Jan_Dittrich **Do we really find a Lindy effect in the Wikidata acount age
distribution?**
**Assumption.** As demonstrated in Eliazar, Iddo (November 2017). "Lindy's
Law". Physica A: Statistical Mechanics and Its Applications. 486: 7
GoranSMilovanovic added a comment.
@Jan_Dittrich @awight
Finally, as of
> ... user behavior on talk pages
F34570923: 07_RevisionTalkNamespacesVSLeftWikidata.png
<https://phabricator.wikimedia.org/F34570923>
but please take into your considerations that the dist
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
Test commenced approx. 20:30 CET today, and this is all we have:
year month day hourcampaign userid
1: 2021 7 307 WMDE_2021_wikipost_1_11 3771433
username
1: Test 2
GoranSMilovanovic added a comment.
@Manuel We did not touch upon this one in our 1:1. Do we need anything else
here? Please let me know. Thanks!
TASK DETAIL
https://phabricator.wikimedia.org/T286257
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a comment.
@Jan_Dittrich @awight
In reference to T282563#7186386
<https://phabricator.wikimedia.org/T282563#7186386> and T282563#7226336
<https://phabricator.wikimedia.org/T282563#7226336>:
- I have used a fresh dataset, relying on the `2021-06`
GoranSMilovanovic added a comment.
- Re-work on a fresh dataset (the `2021-06` snapshot of the
`wmf.mediawiki_history` table) is underway;
- Reporting: until tonight (hopefully);
- @Jan_Dittrich I will be getting in touch via e-mail about the
research/paper part later during the day
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
Finally, another search for campaign registered users in reference to
T284826#7245950 <https://phabricator.wikimedia.org/T284826#7245950> and
T284826#7245960 <https://phabricator.wikimedia.org/T284826#7245960>: nothing
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
In the meantime: there are no campaign registrations containing anything
similar to `WMDE_2021_wikipost`.
The following query - pretty much standard in all our recent campaigns -
returns an empty result set:
SELECT year
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> I've just created the following account Test Hanna Klein (WMDE) and I've
already edited on Wikipedia with that account.
It will take some time before the databases register that.
Please, do the following in
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
- Nothing found while additional controlling for URL encoding of special
characters;
- one final stretch: looking for anything under the `uri_path`:
`/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia` that has any of the
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE Nothing found `RLIKE
https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia#Schritt1`
**at all** since the beginning of the campaign.
- Now running one additional, final check; then
- re-running full data
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> ... I think that some page views should appear in your analysis. What might
be the reason for this deviation?
- running a "soft" approach (i.e. regex w. RLIKE `Schritt1`) now - reporting
back as soon as I have the
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
I have re-run our data acquisition procedures looking exactly for the
following URLs (without the campaign tags attached, of course):
https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia#Schritt1_
GoranSMilovanovic added a comment.
- Next step: ORES per class in Human vs Bots Statistics.
TASK DETAIL
https://phabricator.wikimedia.org/T285458
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> ... have those really not been clicked at all or might there be another
reason for not appearing?
I am on it now.
TASK DETAIL
https://phabricator.wikimedia.org/T284826
EMAIL PREFERENCES
https://phabricator.wikimedia.
GoranSMilovanovic added a subscriber: max_klemm.
GoranSMilovanovic added a comment.
@max_klemm @Tobi_WMDE_SW @Hanna_Klein_WMDE @Merle_von_Wittich_WMDE @Manuel
Following an e-mail exchange with Max, the following links are now added to
the campaign tracking/analytics script:
For
GoranSMilovanovic added a comment.
@Manuel
Here is a concise report that relies on UNESCO Language Status
<http://www.unesco.org/languages-atlas/>:
F34547890: Wikidata_LanguageStatusReport.nb.html
<https://phabricator.wikimedia.org/F34547890>
The analyses presented
GoranSMilovanovic added a comment.
@Manuel
For our 1:1 this morning, an updated report, and as discussed in our previous
1:1:
- section 2.5 ORES quality in Human (Q5),
- section 2.7 The distribution of ORES scores in the remaining Wikidata
classes (Wikidata - (Astronomical Object
GoranSMilovanovic added a comment.
An alternative option is to
- have the dashboard re-organized so
- that controls would be included for uses
- to select among different types of anomalies, suitably described in
non-technical terms.
Also, I was think about implementing a search
GoranSMilovanovic claimed this task.
GoranSMilovanovic added a project: User-GoranSMilovanovic.
GoranSMilovanovic added subscribers: Manuel, Tobi_WMDE_SW.
TASK DETAIL
https://phabricator.wikimedia.org/T286277
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
GoranSMilovanovic added a comment.
@amy_rc @Lydia_Pintscher
The current sampling of anomalies that would be presented to a user of
Curious Facts is random and proportional to the size of the respective anomaly
set.
I have also read that comment on the project's talk page but
GoranSMilovanovic closed this task as "Resolved".
GoranSMilovanovic added a comment.
@amy_rc Ok. Closing the ticket as resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T277564
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailprefer
GoranSMilovanovic closed subtask T277564: [Curious Facts] take separators into
account for single value constraints as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T261906
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailprefer
GoranSMilovanovic added a comment.
@amy_rc
Fixed; please take a look at Qurator Curious Facts
<https://wikidata-analytics.wmcloud.org/app/WikidataAnalytics>.
TASK DETAIL
https://phabricator.wikimedia.org/T277564
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
GoranSMilovanovic claimed this task.
GoranSMilovanovic added a project: User-GoranSMilovanovic.
TASK DETAIL
https://phabricator.wikimedia.org/T286242
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: GoranSMilovanovic
GoranSMilovanovic added subscribers: LucasWerkmeister, Manuel, Tobi_WMDE_SW,
GoranSMilovanovic.
GoranSMilovanovic added a comment.
@LucasWerkmeister Thanks for catching this.
I will take a look at the API that the Curious Facts project currently uses
to fetch Wikimedia Commons images
GoranSMilovanovic added a comment.
**Next steps**:
- interactive Pyspark (Jypiter/Analytics Cluster) approach:
- generate M3 (single value constraint violations) solutions from the hdfs
dump;
- write out a direct test against WDQS;
- sample the "suspects" - it
GoranSMilovanovic added a comment.
@amy_rc I see. I have also tested myself and found more similar cases.
@Manuel @Tobi_WMDE_SW
Upon numerous attempts to solve this problem now I need to declare that all
general approaches have failed.
This must be, I believe, a consequence of
GoranSMilovanovic added a subscriber: Manuel.
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE @Merle_von_Wittich_WMDE @Manuel @Tobi_WMDE_SW
The campaign analytics code is now running on an automatic daily schedule
from stat1007's crontab.
Please let me know until when
GoranSMilovanovic added a comment.
@Manuel
Here is my current take on
> ideas about possible next steps (towards a better understanding of the
current distribution of the ORES quality scores across Wikidata’s classes)
- Gather potential explanatory variables and mo
GoranSMilovanovic added a comment.
@amy_rc @Lydia_Pintscher
Could some please take a look at this ticket and let me know if we can
finally resolve it?
Thank you!
It's here: Qurator Curious Facts
<https://wikidata-analytics.wmcloud.org/app/CuriousFacts> : )
TASK DET
GoranSMilovanovic added a comment.
@Manuel
Please take a look at the following report if you find some time before our
1:1 at 14:30 CET today:
F34540210: Wikidata_ORES_Class_Distributions.nb.html
<https://phabricator.wikimedia.org/F34540210>
I will give you a walk-throug
GoranSMilovanovic closed subtask T285752: [Curious Facts] A fact with
"single value constraint" as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T261906
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMil
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE @Merle_von_Wittich_WMDE
- Analytics code is updated to encompass the recently added pages and tags
- Analytics will be updated daily in the public data directory
<https://analytics.wikimedia.org/published/datasets/wmde-analyt
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE I have responded in the tracking doc too: just leave it as
it is, and thank you very much!
TASK DETAIL
https://phabricator.wikimedia.org/T284826
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> Ok :) I will add them all now! Will let you know as soon as it's finished!
Great, because it would be best to have all the tags in place now. Please let
me know when the table is complete. I will then update the analyt
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
I almost forgot:
> I will prepare 3 further newsletters from July 21st and will add more tags
to the tracking doc.
Please place the new pages to track from July 21 in a **separate table** in
the tracking document
<
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> Could you manage on 23rd of July f.ex.?
As mentioned in our 1:1 already I will be available as of July 25.
If you add the new pages/tags to the tracking document before July 15 I will
be able to setup the analytics code bef
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
I have just realized that even if the double quotation marks are removed from
the URLs some tags are still observed as having a quotation mark attached. For
whatever reason this happens, I will simply clean that up in the analytics
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> i don't understand: is there a problem?
The problem - which is really not a problem - is that
`?WMDE_2021_wikipost_3_1` and `WMDE_2021_wikipost_3_1` would be treated as two
different tags in the analytics code.
I
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
I found the following
"10","?WMDE_2021_wikipost_3_1","/wiki/Wikipedia:WikiProjekt_Frauen/Frauen_in_Rot",3,2021-07-01,"2021_WMDE_Newsletter"
"11","?WMDE_2021_wikipost
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
Is it possible to have all the pages that we need to track in this campaign -
irrespective of the starting date - listed in the tracking document
<https://docs.google.com/document/d/1xziEs3HyR48a_BRzCnSnytuuu6nbrr479WQMCsCofSk/edit
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
> I've also noticed the ?? - does it matter?
Not really, I was just wondering if there is a reason to it.
> At the moment I am preapring new tags for the 2nd and 3rd newsletter
campaigns, who will be sent on 7
GoranSMilovanovic added a comment.
@Hanna_Klein_WMDE
- Analytics code tested;
- Regular daily updates are now scheduled;
- The data will be published in
https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/NewEditors/campaigns/2021_WMDE_Newsletter_Campaign
GoranSMilovanovic added a comment.
@Manuel As we agreed in our 1:1 today:
- prioritizing Exploratory Data Analysis/Hypothesis Generation over
clustering;
- let's first see what insights can we have before making any modeling
assumptions.
TASK DETAIL
GoranSMilovanovic added a comment.
@manuel
(1) classes x scores (wide data representation) →**we have this data
representation now**
(2) Let's make a choice of a clustering algorithm, candidate no.1: K-means in
Apache Spark's MLlib.
TASK DETAIL
https://phabricator.wik
GoranSMilovanovic added a comment.
@Manuel
> Maybe let's quickly talk about this in our 1:1?
Of course.
> What would you cluster by?
Well, I guess in the beginning it would only be a matrix of (1) Wikidata
classes x (2) the counts of ORES A, B, C, D, E score
GoranSMilovanovic added a comment.
@Jan_Dittrich Following our 20210630 discussion:
**Additional questions**
- for those ~ 6% who are still with us: can we find any interesting patterns
- the distribution of the length of their periods of inactivity
- the distribution of
GoranSMilovanovic added a comment.
Update 20210630
- join items x scores x classes: **done**
- all items with missing ORES predictions were filtered out;
- all duplicated set theoretic/mereological relations were singled out
(e.g. if an item refers to a class via both `P31` and
GoranSMilovanovic added a comment.
@amy_rc The issue should be resolved now, please see Qurator Curious Facts
<https://wikidata-analytics.wmcloud.org/app/CuriousFacts>.
TASK DETAIL
https://phabricator.wikimedia.org/T277564
EMAIL PREFERENCES
https://phabricator.wikimedia.org/se
GoranSMilovanovic removed GoranSMilovanovic as the assignee of this task.
GoranSMilovanovic added a comment.
Re-assigning.
TASK DETAIL
https://phabricator.wikimedia.org/T221103
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc
GoranSMilovanovic claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T221103
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Manuel, Pigsonthewing, VIGNERON, Lydia_Pintscher, GoranSMilovanovic,
Lea_Lacroix_WMDE
GoranSMilovanovic added a comment.
@amy_rc I think I've found the cause of things like T277564#7157895
<https://phabricator.wikimedia.org/T277564#7157895>. It definitely has to do
with the following observation of yours:
> ... we observed that the tool only considers val
GoranSMilovanovic added a comment.
- Wikidata Languages Landscape automation from stat1008 Analytcs Client: done.
TASK DETAIL
https://phabricator.wikimedia.org/T283571
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Aklapper
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T284850
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Manuel, RhinosF1, GoranSMilovanovic, Tobi_WMDE_SW, Lydia
GoranSMilovanovic added a comment.
@amy_rc @Lydia_Pintscher
Could it be the case that mapping relation type
<https://www.wikidata.org/wiki/Property:P4390> is treated a separator - which
overrides the single value constraint - and the Curious Facts system then
recognizes that on
GoranSMilovanovic added a comment.
@amy_rc The part unclear to me is the following one:
> ... we observed that the tool only considers values containing qualifiers.
From the docs
<https://www.wikidata.org/wiki/Help:Property_constraints_portal/Single_value>:
> A qual
GoranSMilovanovic added a comment.
@amy_rc Could you please clarify T277564#7158984
<https://phabricator.wikimedia.org/T277564#7158984>? Thank you.
TASK DETAIL
https://phabricator.wikimedia.org/T277564
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailprefe
GoranSMilovanovic added a comment.
@MisterSynergy Could you please check the wdcm_topItems.csv
<https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/wdcm/etl/wdcm_topItems.csv>
dataset now and let me know if it looks alright?
TASK DETAIL
GoranSMilovanovic closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T277551
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: Tobi_WMDE_SW, Scott_WUaS, WMDE-leszek, amy_rc, GoranSMilovanovic
GoranSMilovanovic added a comment.
@Jan_Dittrich
**Please disregard all previous findings**. The following is based on:
- the definition of editor inactivity in T282563#7124389
<https://phabricator.wikimedia.org/T282563#7124389>,
- and the two important corrections
GoranSMilovanovic added a comment.
@MisterSynergy
- full manual update of the WDCM Sqoop procedure is now completed;
- 876 partitions (`wiki_db`) are present in the Data Lake, which means that
everything should be fine,
- except for if something changed in the per wiki
GoranSMilovanovic added a comment.
- Sqoop Shard 4 running now (Commons): in comparison to what was observed
from the WDCM Sqoop Clients Log in T284850#7152935
<https://phabricator.wikimedia.org/T284850#7152935>, I see no problem in
relation to Commons anymore: the Commons database is
GoranSMilovanovic added a comment.
Monitoring Sqoop procedures from Core MediaWiki databases
<https://wikitech.wikimedia.org/wiki/MariaDB#Core_MediaWiki_databases> to
`goransm.wdcm_clients_wb_entity_usage` in the DataLake:
- Shard 1 (enwiki only): completed;
- Shard 2: still r
GoranSMilovanovic added a comment.
@MisterSynergy
- running a manual update of the WDCM sqoop module now;
- monitoring.
TASK DETAIL
https://phabricator.wikimedia.org/T284850
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
GoranSMilovanovic added a comment.
@amy_rc
> However, I ran into a situation where the data was being retrieved
incorrectly. This has happened couple of times. For instance: Qurious Facts:
Silver-Russell syndrome (Q2142496) has 2 values for property: OMIM ID (P492
<
GoranSMilovanovic added a comment.
@amy_rc Issue descriptions modified to look exactly as suggested in
T277551#7137941 <https://phabricator.wikimedia.org/T277551#7137941>, please
test:
https://wikidata-analytics.wmcloud.org/app/CuriousFacts
TASK DETAIL
GoranSMilovanovic added a comment.
@amy_rc
- Full system update completed;
- Issue descriptions are now fixed (local tests completed);
- deploying soon; it will be ready for tests in an hour or so.
TASK DETAIL
https://phabricator.wikimedia.org/T277551
EMAIL PREFERENCES
https
GoranSMilovanovic added a comment.
@MisterSynergy Thank you. No worries, I will figure this out from the WDCM
sqoop logs. Sooner or later.
TASK DETAIL
https://phabricator.wikimedia.org/T284850
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To
GoranSMilovanovic added a comment.
- On the first sight, there were only 687 projects whose reuse data were
sqooped by the WDCM_Sqoop_Clients.R
<https://github.com/wikimedia/analytics-wmde-WD-WikidataAnalytics/blob/master/_engines/_wdcmModules/WDCM_Sqoop_Clients.R>
run, and
- that
GoranSMilovanovic added subscribers: Lydia_Pintscher, Tobi_WMDE_SW,
GoranSMilovanovic.
GoranSMilovanovic claimed this task.
GoranSMilovanovic added a project: User-GoranSMilovanovic.
GoranSMilovanovic triaged this task as "High" priority.
GoranSMilovanovic added a comment.
@Mis
GoranSMilovanovic added a comment.
@amy_rc That is rather strange. I am running a full system update now in
relation to T277551 <https://phabricator.wikimedia.org/T277551>; let's wait for
the new update and then check out if the problem persists. I will perform the
tests and let
201 - 300 of 828 matches
Mail list logo