Re: [Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Obviously, a main aspect of the data presented in the todo stats is "referenced statements". (even though the chart labels there are wrong). Whether or not this query maps directly to todo is actually not the key issue. Clearly, measuring data quality requires that the arity of statement to reference relationships are quantified. Right? This assumption is based on Wikipedia's policy of maintaining a NPOV. And, unfortunately, all unreferenced statements contain a "bias" that makes the data theoretically worthless, even though they may in fact be "correct". On 8 Dec 2015 1:52 pm, "Addshore"wrote: > Addshore added a comment. > > Okay, I'm struggling to see which part of the todo stats this is covering > > > TASK DETAIL > https://phabricator.wikimedia.org/T117234 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: Christopher, Addshore > Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher, > Aklapper, aude, Mbch331 > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure
Since P143 is primarily a "reference type" property, it should be used when the reference node is the subject (with a few exceptions apparently). The query only evaluates the arity of the reference nodes as objects. So, the results for P143 are expected. On 8 Dec 2015 1:09 pm, "Addshore"wrote: > Addshore added a comment. > > I am still confused, Running this for > https://phabricator.wikimedia.org/P143 gives the following: > > nrefs count > 0 920 > 1 8 > > > TASK DETAIL > https://phabricator.wikimedia.org/T117234 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: Christopher, Addshore > Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper, > Wikidata-bugs, aude, Mbch331 > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)
It is possible that a Hadoop architecture could provide the performance and scalability needed for robust statistical analysis of the Wikidata RDF datasets. It is also possible that Jena may have better integration tools with Hadoop that Blazegraph. See https://jena.apache.org/documentation/hadoop/ I do not see a direct relationship however between T115242 and performance other than that the reasoning behind filtering these "boring" objects is based on the perceived negative performance impact of allowing them to be queried from a publicly accessible endpoint. The intent of T115242 is to provide these objects in a dataset to a "nonpublic" query interface for metrics evaluation only. The question that should be asked is whether Blazegraph and the WDQS platform are robust enough for intense stat analysis and if not, why and what can be done to improve them? On 26 Oct 2015 10:00, "JanZerebecki"wrote: > JanZerebecki added a comment. > > @Christopher can as he created https://phabricator.wikimedia.org/T115242. > > > TASK DETAIL > https://phabricator.wikimedia.org/T116547 > > EMAIL PREFERENCES > https://phabricator.wikimedia.org/settings/panel/emailpreferences/ > > To: JanZerebecki > Cc: Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper, > Ricordisamoa, Wikidata-bugs, aude > > > > ___ > Wikidata-bugs mailing list > Wikidata-bugs@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs > ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
Re: [Wikidata-bugs] [Maniphest] [Commented On] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata
Any preferred repo path on gerrit? Suggestions: (wikidata/dashboard) or (wikidata/analytics/dashboard) or (analytics/wikidata/dashboard) On 27 August 2015 at 12:50, Lydia_Pintscher no-re...@phabricator.wikimedia.org wrote: Lydia_Pintscher added a comment. Gerrit please. TASK DETAIL https://phabricator.wikimedia.org/T108732 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: Abraham, Christopher, Lydia_Pintscher, Ironholds, JanZerebecki, Deskana, Aklapper, Wikidata-bugs, aude, Malyacko ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs