[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-18 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, Simon_V

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, Maintenance_bot Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkme

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic, gerritbot Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, S

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a comment. Change 617863 had a related patch set uploaded (by GoranSMilovanovic; owner: GoranSMilovanovic): [analytics/wmde/WD/WD_HumanEdits@master] T248308 https://gerrit.wikimedia.org/r/617863 TASK DETAIL https://phabrica

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a comment. Change 617863 **merged** by GoranSMilovanovic: [analytics/wmde/WD/WD_HumanEdits@master] T248308 https://gerrit.wikimedia.org/r/617863 TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES http

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher We forgot to mention this task in our recent 1:1. In the meantime, I've tested a 10% daily queries sample and the statistics of the smaller, previously used 1% daily queries sample, turn out to be quite representative. However, if tabulati

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher You're welcome. > We should get this list once a quarter or so to find new uses of our data It is perfectly doable. Let's discuss this on Monday and see what data and statistics precisely do we want to have reported regularly. TASK

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-24 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. In T248308#6324161 , @GoranSMilovanovic wrote: > @Lydia_Pintscher > > Let's see if there is anything interesting here: > > F31943519: ref_user_agent_sample.csv

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Superfine. Enjoy your holidays! TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment. @GoranSMilovanovic I have indeed done some analysis using Apache Jena parser to extract algebraic representation of queries. Not yet to the level of completion I like though. I'll be on holidays until August 15th starting tonight - let's discuss when I come back?

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Awesome! You did a nice EDA here + you've analyzed both `event.wdqs_external_sparql_query` and `event.wdqs_internal_sparql_query` - while I've focused only on the `external` source in my previous analyses... So, we do need ML to be able to

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment. @GoranSMilovanovic I finally published a wiki page with most of the results I found: https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Traffic_Analysis Sorry for the delay ... TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher There is absolutely no correlation between (a) how often does a particular `user_agent` value appears, and (b) the mean, or median WDQS processing time for that `user_agent`'s SPARQL queries. We can search for particular `user_age

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @Lydia_Pintscher Let's see if there is anything interesting here: F31943519: ref_user_agent_sample.csv Data: - it is produced from a sample of SPARQL queries from `event.wdqs_external_sparql_q

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Got it, thanks. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeis

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread JAllemandou
JAllemandou added a comment. SELECT http.request_headers['user-agent'], user_agent_map, count(1) as c FROM event.wdqs_external_sparql_query WHERE year = 2020 and month = 5 and day = 1 GROUP BY http.request_headers['user-agent'], user_agent_m

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou However... 0: jdbc:hive2://an-coord1001.eqiad.wmnet:1000> select user_agent_map from event.wdqs_external_sparql_query where year = 2020 and month = 5 and day = 1 limit 10; going to print operations logs printed operations logs

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment. @JAllemandou Please see T248308#6080150 . I also see that `event.wdqs_external_sparql_query` encompasses the `user_agent_map` so yes I will go for it and not for `wmf.webrequest`. > I have done some wo

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-14 Thread JAllemandou
JAllemandou added a comment. > First step: analyze the frequency distribution of the user_agent field (string) from wmf.webrequest where queries are SPARQL. I suggest you use events instead fo webrequest: `event.wdqs_internal_sparql_query` and `event.wdqs_external_sparql_query`. I

[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-14 Thread GoranSMilovanovic
GoranSMilovanovic reopened this task as "Open". GoranSMilovanovic added a comment. - Re-opening the task to address the question of automated vs. non-automated SPARQL queries observed at the WDQS end-point. - Reference: WMDE in-house email and Google Meet discussions with @darthmon_wmde and