[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-06-11 Thread Esc3300
Esc3300 added a comment. Shouldn't users opt-in to this?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Esc3300Cc: Esc3300, JAllemandou, mpopov, mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila,

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-06-11 Thread Smalyshev
Smalyshev added a comment. @Esc3300 Which users? WDQS does not track users, only queries. The log does contain query IP but the data processing will remove it, as well as any other PII.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-19 Thread Smalyshev
Smalyshev added a comment. Thinking about it, I don't think we ever would need more that hourly resolution for anything related to queries (we can get hit stats from the usual stats places I assume). I also thought about dataset #1 as more short-lived. But I am not that insistant on session ID thin

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-19 Thread Nuria
Nuria added a comment. @Smalyshev We like to default to public if possible, the more eyes on the data the more useful it can be.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: NuriaCc: mforns, PokestarFan, Nu

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-12-21 Thread mforns
mforns added a comment. @Nuria @Smalyshev So probably if we round timestamp and remove sessionId your proposal for dattaset #1 is safe to keep long term (cc @mforns for anything I might be missing) I think it depends highly on how drastically we sanitize the potentially identifying fields (user a

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-04 Thread Smalyshev
Smalyshev added a comment. I made a more formal full description of which data I'd like to be in the public dataset, so people don't have to read through all the comments here: https://www.wikidata.org/wiki/User:Smalyshev_(WMF)/Publishing_query_data Please review and comment if you see anything mi

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2018-01-05 Thread JAllemandou
JAllemandou added a comment. @Nuria , @Smalyshev : Given all wikidata-query tagged rows belong in misc, which is super small, I have no objection running jobs either hourly or daily.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pane

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-05 Thread Smalyshev
Smalyshev added a comment. We have the logs, but they are not publicly accessible. See https://meta.wikimedia.org/wiki/Discovery/Data_access_guidelines#Request_logs for access guidelines.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-07 Thread I9606
I9606 added a comment. Hi folks. It sounds like there is reasonably clear pattern for access. I have a student that could execute this project starting sept. 19 if the barriers were cleared. Anything I can provide to move this along? Thanks!TASK DETAILhttps://phabricator.wikimedia.org/T143819EMA

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-08 Thread debt
debt added a comment. Hi @I9606 - we have a NDA process that your student would need to go through before we can go too much further with this being done in a volunteer capacity. The link is here for the main page: https://meta.wikimedia.org/wiki/Non-disclosure_agreements and your student would ne

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-08 Thread I9606
I9606 added a comment. OK. Do we just sign and mail that in or is there a specific contact person we should be in touch with?TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: I9606Cc: debt, thiemowmde, Jonas,

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread leila
leila added a comment. @I9606 I imagine that what you are interested in will be one of the early outputs of the research documented at https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries . If that is the case, we should wait for the result of that research to gradually start com

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread I9606
I9606 added a comment. Assuming that we can gain access to the output of that work and that it allows us to explore subject-matter specific aspects of the data, then yes, it sounds like it would be a great foundation for what we want to do. I notice that this project started in May this year and t

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread leila
leila added a comment. @I9606 that specific project proposal was initiated in May 2016. The access to data was granted only in September 2016. Timelines will be updated once we know more. :)TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settin

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread AndrewSu
AndrewSu added a comment. Would it be possible for our team to get access to these log files so that we can perform our analyses that are related to, but distinct from, the ones that @mkroetzsch is doing? We are happy to coordinate with Markus so that there is no duplication of effort. But, I sus

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-09 Thread mkroetzsch
mkroetzsch added a comment. @AndrewSu As I just replied to Benjamin Good in this matter, it is a bit too early for this, since we only have the basic technical access as of very recently. We have not had a chance to extract any community shareable data sets yet, and it is clear that it will require

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-10 Thread AndrewSu
AndrewSu added a comment. @mkroetzsch Thank you for the info. We look forward to coordinating more when/if you see fit in the future. Since our project is not dependent on Markus' work, and since I don't believe that our work will negatively impact Markus' project, I propose we treat our request

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-13 Thread leila
leila added a comment. @AndrewSu please read https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations to learn about how we start formal collaborations (which is a per-requisite for accessing the data). If you are interested, please attach a proposal to this phabricator task, ping me

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-15 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. From my side the team around @I9606 and @AndrewSu has useful things to contribute on this topic and it'd be great if their request can be granted.TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/em

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-16 Thread leila
leila added a comment. @AndrewSu Lydia and I had some off-list discussions and we thought it's a good idea that I leave a bit more information for you here: Please don't spend days on the proposal if you decide to submit it. This is supposed to be a 1-2 page proposal that will help us understand

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2016-09-16 Thread AndrewSu
AndrewSu added a comment. Thank you @leila for the guidance on the process and next steps -- very helpful! @I9606 and I will touch base to see how we want to proceed/prioritize from our end...TASK DETAILhttps://phabricator.wikimedia.org/T143819EMAIL PREFERENCEShttps://phabricator.wikimedia.org/set

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-12 Thread Smalyshev
Smalyshev added a comment. Hmm not sure how to implement this yet, as we do not track which items were in query results (might be possible from GUI, though expensive, and probably not possible from API) but may be possible to analyze e.g. property usage in queries. Anybody in Analytics interested i

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment. Just want to add a note that if someone on the WMF side was interested in building the infrastructure to compute these usage metrics, the "Gene Wiki" team would be very willing collaborators in evaluating and refining the metrics. We have been working hard loading biomedi

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Nuria
Nuria added a comment. If @Smalyshev thinks this would be a good idea and can develop the instrumentation for the metrics and own the metric definition (together with "gene wiki") we can help on the project as needed, seems to me that things like these could be computed with the infrastructure we h

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Nuria
Nuria added a comment. As far as I understand you need to publish not only queries to service but also query results (is this correct @Smalyshev?) analyzing those will produce the metric counts @AndrewSu and @leila are interested on. This requires a schema definition of what a query result is (i

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Smalyshev
Smalyshev added a comment. It may be hard to capture query results, given that we don't have any mechanism of tracking them now. We do have logs for queries themselves, so that's what I would start with... @AndrewSu if you have any suggestions about the metrics that would be very helpful. Please a

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread AndrewSu
AndrewSu added a comment. My initial thought is that there will be two types of metrics. First, we want to look at statement-level metrics. For all the statements that our team has loaded into Wikidata, we have been referencing specific resources that assert that statement. For example, see the

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-13 Thread Smalyshev
Smalyshev added a comment. Those statements might be part of the output of the SPARQL query, or they might simply be structural intermediates. We don't have currently tools to capture the statistics about output of the query, let alone intermediaries. We could, however (with some work) capture usa

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread Nuria
Nuria added a comment. @Smalyshev @AndrewSu please take a look at other metric definitions we have. once you decide on a metric definition please be so kind as to document it in beta: https://meta.wikimedia.org/wiki/Research:Standard_metrics#Newly_registered_user This helps a lot to quantify what

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment. We could, however (with some work) capture usage of certain property, or item, or property-item combination, in the original query. Would that be useful? Property usage: I think there is some small-ish subset of properties that are very closely tied to a single data pr

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread Nuria
Nuria added a comment. To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders Understood, as I said we are willing to help in any way we can, seems like a great objective. My main point is that if we come up with a metric

[Wikidata-bugs] [Maniphest] [Commented On] T143819: Data request for logs from SparQL interface at query.wikidata.org

2017-06-14 Thread AndrewSu
AndrewSu added a comment. In T143819#3350566, @Nuria wrote: To incentivize them to contribute, we have to give them even better metrics of community usage/impact that they can give to funders Understood, as I said we are willing to help in any way we can, seems like a great objective. My main poi