Re: [Wikidata-bugs] [Maniphest] [Commented On] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Obviously, a main aspect of the data presented in the todo stats is
"referenced statements".  (even though the chart labels there are wrong).
Whether or not this query maps directly to todo is actually not the key
issue.  Clearly, measuring data quality requires that the arity of
statement to reference relationships are quantified.  Right?

This assumption is based on Wikipedia's policy of maintaining a NPOV.  And,
unfortunately, all unreferenced statements contain a "bias" that makes the
data theoretically worthless, even though they may in fact be "correct".
On 8 Dec 2015 1:52 pm, "Addshore" 
wrote:

> Addshore added a comment.
>
> Okay, I'm struggling to see which part of the todo stats this is covering
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Wikidata-bugs, Lydia_Pintscher, StudiesWorld, Addshore, Christopher,
> Aklapper, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Updated] T117234: Reproduce wikidata-todo/stats data using analytics infrastructure

2015-12-08 Thread Christopher Johnson
Since P143 is primarily a "reference type" property, it should be used when
the reference node is the subject (with a few exceptions apparently). The
query only evaluates the arity of the reference nodes as objects.  So, the
results for P143 are expected.
On 8 Dec 2015 1:09 pm, "Addshore" 
wrote:

> Addshore added a comment.
>
> I am still confused, Running this for
> https://phabricator.wikimedia.org/P143 gives the following:
>
>   nrefs count
>   0 920
>   1 8
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T117234
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: Christopher, Addshore
> Cc: Lydia_Pintscher, StudiesWorld, Addshore, Christopher, Aklapper,
> Wikidata-bugs, aude, Mbch331
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Commented On] T116547: try computing certains wikidata stats via hadoop (e.g. spark) instead of query.w.o (blazegraph)

2015-10-26 Thread Christopher Johnson
It is possible that a Hadoop  architecture could provide the performance
and scalability needed for robust statistical analysis of the Wikidata RDF
datasets.

It is also possible that Jena may have better integration tools with Hadoop
that Blazegraph.

See https://jena.apache.org/documentation/hadoop/

I do not see a direct relationship however between T115242 and performance
other than that the reasoning behind filtering these "boring" objects is
based on the perceived negative performance impact of allowing them to be
queried from a publicly accessible endpoint.

The intent of T115242 is to provide these objects in a dataset to a
"nonpublic" query interface for metrics evaluation only.

The question that should be asked is whether Blazegraph and the WDQS
platform are robust enough for intense stat analysis and if not, why and
what can be done to improve them?
On 26 Oct 2015 10:00, "JanZerebecki" 
wrote:

> JanZerebecki added a comment.
>
> @Christopher can as he created https://phabricator.wikimedia.org/T115242.
>
>
> TASK DETAIL
>   https://phabricator.wikimedia.org/T116547
>
> EMAIL PREFERENCES
>   https://phabricator.wikimedia.org/settings/panel/emailpreferences/
>
> To: JanZerebecki
> Cc: Addshore, Christopher, JanZerebecki, Lydia_Pintscher, Aklapper,
> Ricordisamoa, Wikidata-bugs, aude
>
>
>
> ___
> Wikidata-bugs mailing list
> Wikidata-bugs@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
>
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


Re: [Wikidata-bugs] [Maniphest] [Commented On] T108732: [Task] Train Wikidata people on how to add data/metrics to a Shiny dashboard for Wikidata

2015-08-27 Thread Christopher Johnson
Any preferred repo path on gerrit?

Suggestions: (wikidata/dashboard) or (wikidata/analytics/dashboard) or
(analytics/wikidata/dashboard)

On 27 August 2015 at 12:50, Lydia_Pintscher 
no-re...@phabricator.wikimedia.org wrote:

 Lydia_Pintscher added a comment.

 Gerrit please.


 TASK DETAIL
   https://phabricator.wikimedia.org/T108732

 EMAIL PREFERENCES
   https://phabricator.wikimedia.org/settings/panel/emailpreferences/

 To: Lydia_Pintscher
 Cc: Abraham, Christopher, Lydia_Pintscher, Ironholds, JanZerebecki,
 Deskana, Aklapper, Wikidata-bugs, aude, Malyacko



 ___
 Wikidata-bugs mailing list
 Wikidata-bugs@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs