[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-10-28 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher I guess this task is completed now. However, we might need a new ticket in relation to this: - to re-factor most of the data engineering code to work in the analytics cluster - (it is now done in R on a single server by a process

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-27 Thread abian
abian added a comment. Thank you both! :-) I have several concerns about how users may use and understand this indicator; I'll list the main ones in case you find them helpful, of course without any intention of hindering your work or preventing us from having metrics that help us bette

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-26 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher A slightly adjusted version of the report: F30478577: Wikidata Quality Report.nb.html - no qualitative differences in the results/conclusions; - addition: taking care to eliminate all

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-25 Thread Halfak
Halfak added a comment. @abian, ORES models directly a measure "completeness". However, it turns out that accuracy and consistency strongly correlate to these measures of "completeness" so it also a //good and useful// proxy measure of "consistency" and "accuracy". I'd like to know when an

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Here is the final version of the Report, including the timeline of the latest revids made for A, B, C, D, and E class items: F30435077: Wikidata Quality Report.nb.html Please let me kno

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Here is a new version of the report with the Grading Scheme for Wikidata items included: F30434504: Wikidata Quality Report.nb.html

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-22 Thread abian
abian added a comment. Thanks for all the work! I have a question: what dimensions of data quality (completeness, accuracy, consistency...) are you guys considering when you speak of "quality" in this scope? The term "quality" is a buzzword used by people to name things that sometimes have n

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @RazShuty @WMDE-leszek Here's a prototype of a Wikidata Quality Report. F30430120: Wikidata Quality Report.nb.html NEXT STEPS: - Include a bit more info on ORES in the report i

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-17 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. Status: - working on analytics/visualizations now; - next steps: dashboard. TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: darth

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-04 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Halfak Thank you, Aaron. TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: darthmon_wmde, Ladsgroup, elal, Halfak, RazShuty, hoo, Aklappe

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-09-04 Thread Halfak
Halfak added a comment. For clarity, making millions of calls to ORES is totally feasible. We have a utility for doing just this. @GoranSMilovanovic has been using the `ores score_revisions` utility. If you create a json file with a field called "rev_id" containing the most recent rev_id

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-05-23 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher @RazShuty @Halfak Ok, here's what I've got: item revision timestamp usage 1 Q36524 924799644 2019-04-26 06:29:25.0 6791020 2 Q54919 929383859 2019-04-30 21:14:14.0 4376000 3 Q423048 919180363 2019-

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2019-03-26 Thread Harej
Harej added a comment. @Lydia_Pintscher Does this depend on work from Scoring Platform? TASK DETAIL https://phabricator.wikimedia.org/T195702 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Harej Cc: Harej, hoo, Aklapper, Esc3300, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] [Commented On] T195702: track quality of all/top 10000 Wikidata items over time

2018-05-28 Thread Esc3300
Esc3300 added a comment. Sounds like an interesting idea. It might be easier to do a static set and measure how that evolves. Not sure if it will add much to the general discussions, these are generally mixed with many marginally important factors. Occasionally, we get the fallout from that at Wik