[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-06-11 Thread Smalyshev
Smalyshev added a comment.
@Esc3300 Which users? WDQS does not track users, only queries. The log does contain query IP but the data processing will remove it, as well as any other PII.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Esc3300, JAllemandou, mpopov, mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-06-11 Thread Esc3300
Esc3300 added a comment.
Shouldn't users opt-in to this?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Esc3300Cc: Esc3300, JAllemandou, mpopov, mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-05 Thread JAllemandou
JAllemandou added a comment.
@Nuria , @Smalyshev : Given all wikidata-query tagged rows  belong in misc, which is super small, I have no objection running jobs either hourly or daily.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JAllemandouCc: JAllemandou, mpopov, mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-04 Thread Smalyshev
Smalyshev added a comment.
I made a more formal full description of which data I'd like to be in the public dataset, so people don't have to read through all the comments here: https://www.wikidata.org/wiki/User:Smalyshev_(WMF)/Publishing_query_data

Please review and comment if you see anything missing or wrong.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-21 Thread mforns
mforns added a comment.
@Nuria @Smalyshev

So probably if we round timestamp and remove sessionId your proposal for dattaset #1 is safe to keep long term (cc @mforns for anything I might be missing)

I think it depends highly on how drastically we sanitize the potentially identifying fields (user agent and client IP) and the fields that can indicate user acivity/features (query, location).
Intuitively it seems to me that we can keep this data in a private store indefinitely if sanitized. But having those sensitive 4 fields in the same data set will make it difficult to publicize, even if sanitized. I don't know how frequent are WDQS queries, but I imagine they are several orders of magnitude smaller than pageviews. Thus the buckets of this data set are likely to be sparse and small, which increases the threat to user privacy.

If we wanted to make this public, I'd go for removing the geographic location field entirely, and probably for daily or monthly resolution instead of hourly (depending on bucket size).
Also, splitting the data set in several unrelatable thematic data sets could help: queries by country, queries by user agent, session queries, etc.

Sorry if I'm too pessimistic, I'm not familiar with the kind of information that WDQS queries can give away about users.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mfornsCc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-19 Thread Nuria
Nuria added a comment.
@Smalyshev We like to default to public if possible, the more eyes on the data the more useful it can be.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-19 Thread Smalyshev
Smalyshev added a comment.
Thinking about it, I don't think we ever would need more that hourly resolution for anything related to queries (we can get hit stats from the usual stats places I assume). I also thought about dataset #1 as more short-lived. But I am not that insistant on session ID thing, maybe dropping it is fine too and then we could make it public.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment.

In T143819#3350566, @Nuria wrote:
To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders

Understood, as I said we are willing to help in any way we can, seems like a great objective. My main point is that if we come up with a metric we should document it outside this ticket once we have some agreement.


Got it, definitely will do that!  Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread Nuria
Nuria added a comment.
To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders

Understood, as I said we are willing to help in any way we can, seems like a great objective. My main point is that if we come up with a metric we should document it outside this ticket once we have some agreement.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment.
We could, however (with some work) capture usage of certain property, or item, or property-item combination, in the original query. Would that be useful?




Property usage: I think there is some small-ish subset of properties that are very closely tied to a single data provider (e.g.,  Disease Ontology ID (P699)) where property usage would be informative to that data provider.  But since usage of a single property could (usually?) span many different data providers, I think this will not be sufficient for most data providers.



Item usage: This partially addresses the "item-level metrics" in my last post, but it depends on how it's counted.  Again, suppose I'm interested in metrics on Alzheimer's disease.  If you mean counting the number of explicit mentions of Q11081 in a SPARQL query (eg "how many symptoms does Alzheimer's disease have?), that's a good start.  But that misses out on cases where the item is returned as a result but not explicitly mentioned (eg "What diseases have a symptom of memory loss?").



Property-item usage: Not seeing clearly exactly how this might work, but I think the same caveats as Item usage apply.


Note also that I don't think any of these metrics get at the "statement-level metrics" I described above.  These arguably will be the more common case too.

As one very vague idea regarding a possible implementation that did account for outputs and intermediate, perhaps we could set up a temporary database that removed all items/statements from a given data provider, reran a set of sparql queries, and then compared results.  If the results differed, then you could empirically say that the data provider was important for that query.  Obviously complexity/scale are issues here...

For overall context, data providers are continually having to justify their existence to funders (e.g. the NIH), usually in terms of how important they are to a community of third-parties (e.g. research scientists).  Currently they do that through restrictive licenses, so they can point to the number of licensees they have.  If we want to convince them to contribute to Wikidata, they immediately lose the licensee count metric because there is no requirement to license.  To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders.  Just want to explain this perspective in case it wasn't clear...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread Nuria
Nuria added a comment.
@Smalyshev @AndrewSu please take a look at other metric definitions we have. once you decide on a metric definition please be so kind as to document it in beta: https://meta.wikimedia.org/wiki/Research:Standard_metrics#Newly_registered_user

This helps a lot to quantify what thimgs mean when you see them on a dashboardTASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Smalyshev
Smalyshev added a comment.
Those statements might be part of the output of the SPARQL query, or they might simply be structural intermediates.

We don't have currently tools to capture the statistics about output of the query, let alone intermediaries. We could, however (with some work) capture usage of certain property, or item, or property-item combination, in the original query. Would that be useful?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment.
My initial thought is that there will be two types of metrics.  First, we want to look at statement-level metrics.  For all the statements that our team has loaded into Wikidata, we have been referencing specific resources that assert that statement.  For example, see the human gene reelin (Q414043). This gene has a genetic association (P2293) with the disease Alzheimer's disease (Q11081), as stated in (P248) a database called Phenocarta (Q22330995).  We would like to provide the Phenocarta team statistics on how often Phenocarta-referenced statements are used in SPARQL queries.  Those statements might be part of the output of the SPARQL query, or they might simply be structural intermediates.

Second, we might also want to look at item-level metrics.   See for example visual agnosia (Q18742).  This item is mapped to the Disease Ontology (Q5282129) through the Disease Ontology ID (P699) (and one intermediate item for the specific release of the ontology).  Again, we would want to provide the Disease Ontology team metrics on how often DO-linked items were utilized (either directly or indirectly) in SPARQL queries.  (Note also that ontologies that are referenced as external identifiers in Wikidata items will very often also be referenced in support of instance of (P31) or subclass of (P279) statements, which may fall under the previous category.)

Computing one or both of these metrics in my mind would be good first steps, though I'm guessing there would need to be further iteration once we examine the results.  Hope this is helpful...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Smalyshev
Smalyshev added a comment.
It may be hard to capture query results, given that we don't have any mechanism of tracking them now. We do have logs for queries themselves, so that's what I would start with...

@AndrewSu if you have any suggestions about the metrics that would be very helpful. Please add them here.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Nuria
Nuria added a comment.
As far as I understand you need to publish not only queries to service but also query results (is this correct @Smalyshev?)  analyzing those will produce the metric counts  @AndrewSu  and @leila are interested on. This requires a schema definition of what a query result is (i imagine) so it seems that there is some work to do on the wikidata end before being able to product counts.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Nuria
Nuria added a comment.
If @Smalyshev thinks this would be a good idea and can develop the instrumentation for the metrics and own the metric definition (together with "gene wiki") we can help on the project as needed, seems to me that things like these could be computed with the infrastructure we have in place.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment.
Just want to add a note that if someone on the WMF side was interested in building the infrastructure to compute these usage metrics, the "Gene Wiki" team would be very willing collaborators in evaluating and refining the metrics.  We have been working hard loading biomedical data into Wikidata.  We've convinced several resources to convert to CC0, but we're also talking with many data providers who have reservations (many of which might be addressed by usage statistics).  Based on these interactions, I think we have a pretty good perspective on what metrics would be valuable to this cross section of data providers.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-12 Thread Smalyshev
Smalyshev added a comment.
Hmm not sure how to implement this yet, as we do not track which items were in query results (might be possible from GUI, though expensive, and probably not possible from API) but may be possible to analyze e.g. property usage in queries. Anybody in Analytics interested in helping with this?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-16 Thread AndrewSu
AndrewSu added a comment.
Thank you @leila for the guidance on the process and next steps -- very helpful!  @I9606 and I will touch base to see how we want to proceed/prioritize from our end...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-16 Thread leila
leila added a comment.
@AndrewSu Lydia and I had some off-list discussions and we thought it's a good idea that I leave a bit more information for you here:


Please don't spend days on the proposal if you decide to submit it. This is supposed to be a 1-2 page proposal that will help us understand what the problem is, why it's important and how you want to solve it (methodology). Some lit review would be great, but at a high level.



Dario and myself are operating at capacity in terms of forming new collaborations at the moment (we have a few more in the pipeline and we already have some in place). This being said, there are at least two other people in my team who may want to initiate this collaboration. Also, if the proposal ends up being aligned with something I will work on this year, I may drop something else to make it happen. This is to say that there is uncertainty on our end until we read the proposal, and there are some resource constraints.


I hope this extra information help you with your decision. If you have any question, please ping me.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-15 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.
From my side the team around @I9606 and @AndrewSu has useful things to contribute on this topic and it'd be great if their request can be granted.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-13 Thread leila
leila added a comment.
@AndrewSu please read https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations to learn about how we start formal collaborations (which is a per-requisite for accessing the data). If you are interested, please attach a proposal to this phabricator task, ping me, and I'll make sure the Research team will review your request in the coming weeks and get back to you.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-10 Thread AndrewSu
AndrewSu added a comment.
@mkroetzsch Thank you for the info.  We look forward to coordinating more when/if you see fit in the future.

Since our project is not dependent on Markus' work, and since I don't believe that our work will negatively impact Markus' project, I propose we treat our request here as a completely separate initiative.  So unless anyone has an objection to our plan, we await information on next steps.  Again, we are ready to submit signed NDAs as soon as we receive instructions.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-10 Thread mkroetzsch
mkroetzsch added a comment.
@AndrewSu As I just replied to Benjamin Good in this matter, it is a bit too early for this, since we only have the basic technical access as of very recently. We have not had a chance to extract any community shareable data sets yet, and it is clear that it will require some time to get clearance for such data even after we believe it is ready.

In the long run, I would find some collaboration very interesting, but we need to lay the foundations for this first, which will likely take a few more months.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mkroetzschCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread AndrewSu
AndrewSu added a comment.
Would it be possible for our team to get access to these log files so that we can perform our analyses that are related to, but distinct from, the ones that @mkroetzsch is doing?  We are happy to coordinate with Markus so that there is no duplication of effort.  But, I suspect that our analyses are much more specific to the biomedical research community than their general purpose ones.

For context, our team (led by @I9606 and myself) has been spearheading the loading of biomedical data into wikidata through https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology.  We are at a critical point with several potential data providers in convincing them to upload their data, and showing we can provide summarized usage reports for their funding agencies is a key blocker.

If this sounds reasonable, please let us know how we submit our signed NDA agreement forms.  (Email? Attach to this ticket? Mail?)TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AndrewSuCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread leila
leila added a comment.
@I9606 that specific project proposal was initiated in May 2016. The access to data was granted only in September 2016. Timelines will be updated once we know more. :)TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread I9606
I9606 added a comment.
Assuming that we can gain access to the output of that work and that it allows us to explore subject-matter specific aspects of the data, then yes, it sounds like it would be a great foundation for what we want to do.

I notice that this project started in May this year and that it has not added a timeline section yet.  What are the expectations for when it will complete ?  If it is not making active progress, perhaps we could join forces with them immediately rather than waiting around for it to finish?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: I9606Cc: leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread leila
leila added a comment.
@I9606 I imagine that what you are interested in will be one of the early outputs of the research documented at https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries . If that is the case, we should wait for the result of that research to gradually start coming out.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-08 Thread I9606
I9606 added a comment.
OK.  Do we just sign and mail that in or is there a specific contact person we should be in touch with?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: I9606Cc: debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-08 Thread debt
debt added a comment.
Hi @I9606 - we have a NDA process that your student would need to go through before we can go too much further with this being done in a volunteer capacity.

The link is here for the main page: https://meta.wikimedia.org/wiki/Non-disclosure_agreements and your student would need to start here: https://wikitech.wikimedia.org/wiki/File:Volunteer_Non-disclosure_Agreement_Template.pdf.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: debtCc: debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-07 Thread I9606
I9606 added a comment.
Hi folks. It sounds like there is reasonably clear pattern for access.  I have a student that could execute this project starting sept. 19 if the barriers were cleared.  Anything I can provide to move this along?  Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: I9606Cc: thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, debt, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-05 Thread Smalyshev
Smalyshev added a comment.
We have the logs, but they are not publicly accessible. See https://meta.wikimedia.org/wiki/Discovery/Data_access_guidelines#Request_logs for access guidelines.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, mschwarzer, MelodyKramer, Avner, debt, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs