[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment.

In T143819#3350566, @Nuria wrote:
To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders

Understood, as I said we are willing to help in any way we can, seems like a great objective. My main point is that if we come up with a metric we should document it outside this ticket once we have some agreement.


Got it, definitely will do that!  Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment.
We could, however (with some work) capture usage of certain property, or item, or property-item combination, in the original query. Would that be useful?




Property usage: I think there is some small-ish subset of properties that are very closely tied to a single data provider (e.g.,  Disease Ontology ID (P699)) where property usage would be informative to that data provider.  But since usage of a single property could (usually?) span many different data providers, I think this will not be sufficient for most data providers.



Item usage: This partially addresses the "item-level metrics" in my last post, but it depends on how it's counted.  Again, suppose I'm interested in metrics on Alzheimer's disease.  If you mean counting the number of explicit mentions of Q11081 in a SPARQL query (eg "how many symptoms does Alzheimer's disease have?), that's a good start.  But that misses out on cases where the item is returned as a result but not explicitly mentioned (eg "What diseases have a symptom of memory loss?").



Property-item usage: Not seeing clearly exactly how this might work, but I think the same caveats as Item usage apply.


Note also that I don't think any of these metrics get at the "statement-level metrics" I described above.  These arguably will be the more common case too.

As one very vague idea regarding a possible implementation that did account for outputs and intermediate, perhaps we could set up a temporary database that removed all items/statements from a given data provider, reran a set of sparql queries, and then compared results.  If the results differed, then you could empirically say that the data provider was important for that query.  Obviously complexity/scale are issues here...

For overall context, data providers are continually having to justify their existence to funders (e.g. the NIH), usually in terms of how important they are to a community of third-parties (e.g. research scientists).  Currently they do that through restrictive licenses, so they can point to the number of licensees they have.  If we want to convince them to contribute to Wikidata, they immediately lose the licensee count metric because there is no requirement to license.  To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders.  Just want to explain this perspective in case it wasn't clear...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment.
My initial thought is that there will be two types of metrics.  First, we want to look at statement-level metrics.  For all the statements that our team has loaded into Wikidata, we have been referencing specific resources that assert that statement.  For example, see the human gene reelin (Q414043). This gene has a genetic association (P2293) with the disease Alzheimer's disease (Q11081), as stated in (P248) a database called Phenocarta (Q22330995).  We would like to provide the Phenocarta team statistics on how often Phenocarta-referenced statements are used in SPARQL queries.  Those statements might be part of the output of the SPARQL query, or they might simply be structural intermediates.

Second, we might also want to look at item-level metrics.   See for example visual agnosia (Q18742).  This item is mapped to the Disease Ontology (Q5282129) through the Disease Ontology ID (P699) (and one intermediate item for the specific release of the ontology).  Again, we would want to provide the Disease Ontology team metrics on how often DO-linked items were utilized (either directly or indirectly) in SPARQL queries.  (Note also that ontologies that are referenced as external identifiers in Wikidata items will very often also be referenced in support of instance of (P31) or subclass of (P279) statements, which may fall under the previous category.)

Computing one or both of these metrics in my mind would be good first steps, though I'm guessing there would need to be further iteration once we examine the results.  Hope this is helpful...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment.
Just want to add a note that if someone on the WMF side was interested in building the infrastructure to compute these usage metrics, the "Gene Wiki" team would be very willing collaborators in evaluating and refining the metrics.  We have been working hard loading biomedical data into Wikidata.  We've convinced several resources to convert to CC0, but we're also talking with many data providers who have reservations (many of which might be addressed by usage statistics).  Based on these interactions, I think we have a pretty good perspective on what metrics would be valuable to this cross section of data providers.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-16 Thread AndrewSu
AndrewSu added a comment.
Thank you @leila for the guidance on the process and next steps -- very helpful!  @I9606 and I will touch base to see how we want to proceed/prioritize from our end...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-10 Thread AndrewSu
AndrewSu added a comment.
@mkroetzsch Thank you for the info.  We look forward to coordinating more when/if you see fit in the future.

Since our project is not dependent on Markus' work, and since I don't believe that our work will negatively impact Markus' project, I propose we treat our request here as a completely separate initiative.  So unless anyone has an objection to our plan, we await information on next steps.  Again, we are ready to submit signed NDAs as soon as we receive instructions.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread AndrewSu
AndrewSu added a comment.
Would it be possible for our team to get access to these log files so that we can perform our analyses that are related to, but distinct from, the ones that @mkroetzsch is doing?  We are happy to coordinate with Markus so that there is no duplication of effort.  But, I suspect that our analyses are much more specific to the biomedical research community than their general purpose ones.

For context, our team (led by @I9606 and myself) has been spearheading the loading of biomedical data into wikidata through https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology.  We are at a critical point with several potential data providers in convincing them to upload their data, and showing we can provide summarized usage reports for their funding agencies is a key blocker.

If this sounds reasonable, please let us know how we submit our signed NDA agreement forms.  (Email? Attach to this ticket? Mail?)TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs