AndrewSu added a comment.

We could, however (with some work) capture usage of certain property, or item, or property-item combination, in the original query. Would that be useful?

  • Property usage: I think there is some small-ish subset of properties that are very closely tied to a single data provider (e.g., Disease Ontology ID (P699)) where property usage would be informative to that data provider. But since usage of a single property could (usually?) span many different data providers, I think this will not be sufficient for most data providers.
  • Item usage: This partially addresses the "item-level metrics" in my last post, but it depends on how it's counted. Again, suppose I'm interested in metrics on Alzheimer's disease. If you mean counting the number of explicit mentions of Q11081 in a SPARQL query (eg "how many symptoms does Alzheimer's disease have?), that's a good start. But that misses out on cases where the item is returned as a result but not explicitly mentioned (eg "What diseases have a symptom of memory loss?").
  • Property-item usage: Not seeing clearly exactly how this might work, but I think the same caveats as Item usage apply.

Note also that I don't think any of these metrics get at the "statement-level metrics" I described above. These arguably will be the more common case too.

As one very vague idea regarding a possible implementation that did account for outputs and intermediate, perhaps we could set up a temporary database that removed all items/statements from a given data provider, reran a set of sparql queries, and then compared results. If the results differed, then you could empirically say that the data provider was important for that query. Obviously complexity/scale are issues here...

For overall context, data providers are continually having to justify their existence to funders (e.g. the NIH), usually in terms of how important they are to a community of third-parties (e.g. research scientists). Currently they do that through restrictive licenses, so they can point to the number of licensees they have. If we want to convince them to contribute to Wikidata, they immediately lose the licensee count metric because there is no requirement to license. To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders. Just want to explain this perspective in case it wasn't clear...


TASK DETAIL
https://phabricator.wikimedia.org/T143819

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewSu
Cc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to