dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION As part of the offline evaluation of the WDQS graph split (scholarly article vs the rest) we want to extract multiple sets of SPARQL queries. (initially drafted from https://docs.google.com/document/d/1QsV96LtpK5lDD2N2jy-6vaF_0d_Yf_HLb8uFARFMxJ8) 1/ From the query logs identify and extract sets of queries emitted from a set of known sources: - Listeria - Mix-n-match - Pywikibot - wd/wb-integrator 2/ From queries written in wikidata wikipages: - https://observablehq.com/@pac02/hello-sparql-queries-dataset?collection=@pac02/wikidata-tools - https://huggingface.co/datasets/htriedman/wiki-sparql/viewer/htriedman--wiki-sparql/test 3/ A set of queries from the query logs, ideally representative of the following characteristics: - Query size - Query time - Status code (http return status) The output is expected to be a hive table with 2 columns: - query: the sparql query in plain text - provenance: a code identifying the provenance (source) of the query Note: query logs are available in `events.wdqs_external_sparql_query`. TASK DETAIL https://phabricator.wikimedia.org/T349512 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org