[Wikidata-bugs] [Maniphest] T214378: Check simple format constraints (no grouping) in PHP instead of SPARQL
dcausse added a comment. Unsure if feasible but perhaps manually flagging list of safe regex & very popular regex <https://w.wiki/APPB> could help reduce the number of requests to shellbox? TASK DETAIL https://phabricator.wikimedia.org/T214378 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, akosiaris, Michael, sbassett, RazShuty, JBennett, Ladsgroup, Aklapper, Lucas_Werkmeister_WMDE, Ullasoff, Danny_Benjafield_WMDE, S8321414, Cleo_Lemoisson, Astuthiodit_1, karapayneWMDE, Invadibot, Devnull, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Eihel, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, Esc3300, LawExplorer, _jensen, rosalieper, Agabi10, Scott_WUaS, Wong128hk, Luke081515, abian, Wikidata-bugs, aude, Bawolff, Lydia_Pintscher, Grunny, Mbch331, Jay8g, Krenair ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T367510: Request permission to create 4 kafka topics in kafka-main (WDQS graph split)
dcausse renamed this task from "Request permission to create 4 kafka topics in kafka-main" to "Request permission to create 4 kafka topics in kafka-main (WDQS graph split)". TASK DETAIL https://phabricator.wikimedia.org/T367510 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Kappakayala, S8321414, Clement_Goubert, Astuthiodit_1, AWesterinen, Arnoldokoth, BTullis, karapayneWMDE, Invadibot, maantietaja, wkandek, JMeybohm, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T367510: Request permission to create 4 kafka topics in kafka-main
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service, serviceops, Data-Platform-SRE. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: wmde-wikidata-tech. TASK DESCRIPTION As part of the work to split the WDQS graph we will need to populate 4 new topics: - eqiad.rdf-streaming-updater.mutation-main - codfw.rdf-streaming-updater.mutation-main - eqiad.rdf-streaming-updater.mutation-scholarly - codfw.rdf-streaming-updater.mutation-scholarly The expected size of both added should not exceed the size of `eqiad.rdf-streaming-updater.mutation` which is around 17Gb (51Gb <https://grafana-rw.wikimedia.org/d/00234/kafka-by-topic?orgId=1&refresh=5m&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=main-eqiad&var-kafka_broker=All&var-topic=eqiad.rdf-streaming-updater.mutation&var-topic=codfw.rdf-streaming-updater.mutation&from=now-7d&to=now> including replication). Similarly for the rate of messages. Because topic mirroring this means that an additional 100Gb per cluster is required (+100Gb on main-eqiad and +100Gb on main-codfw). These topics must have the following characteristics: - a single partition - retention of 4 weeks AC: [ ] get sign off from #serviceops <https://phabricator.wikimedia.org/tag/serviceops/> [ ] topics are created in both clusters with proper settings TASK DETAIL https://phabricator.wikimedia.org/T367510 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Kappakayala, S8321414, Clement_Goubert, Astuthiodit_1, AWesterinen, Arnoldokoth, BTullis, karapayneWMDE, Invadibot, maantietaja, wkandek, JMeybohm, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, jijiki, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation
dcausse added a comment. I did some testing and sadly when a wdqs node makes a query to https://query.wikidata.org it hits varnish again: from wdqs1020 to https://query.wikidata.org (`echo 'SELECT ?test_dcausse { ?test_dcausse ?p ?o . } LIMIT 1' | curl -f -s --data-urlencode query@- https://query.wikidata.org/sparql?format=json`) "x-request-id": "b34bb930-ef85-4b23-956e-7dcb11f0f7ec", "content-length": "99", "x-forwarded-proto": "http", "x-client-port": "40256", "x-bigdata-max-query-millis": "6", "x-wmf-nocookies": "1", "x-client-ip": "2620:0:861:10a:10:64:131:24", "x-varnish": "800949377", "x-forwarded-for": "2620:0:861:10a:10:64:131:24\\, 10.64.0.79\\, 2620:0:861:10a:10:64:131:24", "x-requestctl": "", "x-cdis": "pass", "accept": "*/*", "x-real-ip": "2620:0:861:10a:10:64:131:24", "via-nginx": "1", "x-bigdata-read-only": "yes", "host": "query.wikidata.org", "content-type": "application/x-www-form-urlencoded", "connection": "close", "x-envoy-expected-rq-timeout-ms": "65000", "x-connection-properties": "H2=1; SSR=0; SSL=TLSv1.3; C=TLS_AES_256_GCM_SHA384; EC=UNKNOWN;", "user-agent": "curl/7.74.0" which is very similar to when querying from outside the network "x-request-id": "3380f86f-99bc-4f0a-ac74-48e60317836d", "content-length": "85", "x-forwarded-proto": "http", "x-client-port": "55334", "x-bigdata-max-query-millis": "6", "x-wmf-nocookies": "1", "x-client-ip": "redacted", "x-varnish": "512603614", "x-forwarded-for": "redacted\\, 10.136.1.11\\, 2620:0:861:10e:10:64:135:23", "x-requestctl": "", "x-cdis": "pass", "accept": "*/*", "x-real-ip": "2620:0:861:10e:10:64:135:23", "via-nginx": "1", "x-bigdata-read-only": "yes", "host": "query.wikidata.org", "content-type": "application/x-www-form-urlencoded", "connection": "close", "x-envoy-expected-rq-timeout-ms": "65000", "x-connection-properties": "H2=1; SSR=0; SSL=TLSv1.3; C=TLS_AES_256_GCM_SHA384; EC=UNKNOWN;", "user-agent": "curl/7.81.0" If querying lvs via wdqs.discovery.wmnet directly we might have what we'd need (`echo 'SELECT ?lvs_eqiad_test_dcausse {?lvs_eqiad_test_dcausse ?p ?o .} LIMIT 1' | curl -v -f -s --data-urlencode query@- https://wdqs.discovery.wmnet/sparql?format=json`) "x-real-ip": "2620:0:861:10a:10:64:131:24", "x-request-id": "ef9b0e66-3b6f-48ae-a36f-cb1e67f93950", "content-length": "110", "x-forwarded-proto": "http", "x-bigdata-read-only": "yes", "host": "wdqs.discovery.wmnet", "x-bigdata-max-query-millis": "6", "content-type": "application/x-www-form-urlencoded", "connection": "close", "x-envoy-expected-rq-timeout-ms": "65000", "x-forwarded-for": "2620:0:861:10a:10:64:131:24", "user-agent": "curl/7.74.0", "accept": "*/*" Hitting lvs might require a mapping like `https://query-main.wikidata.org` -> `https://wdqs-main.discovery.wmnet`, which I believe could be possible using `ServiceRegistry#addAlias( "https://wdqs-main.discovery.wmnet/sparql";, "https://query-main.wikidata.org/sparql";)`. This could done by adapting the syntax of the allow-list to enable setting aliases: `service_url[,list of aliases]` e.g. `https://wdqs-main.discovery.wmnet/sparql,https://query-main.wikidata.org/sparql`. The `WikibaseContextListener#loadAllowlist` could be adapted to support this syntax and and call `addAlias()` on the service registry. Additionally we probably want to exclude `*.wmnet` hosts found in the allow list from `org.wikidata.query.rdf.blazegraph.ProxiedHttpConnectionFactory`. Drawback is that hitting lvs from within the same lvs will hit localhost, this is not a problem because the lvs endpoint should be different in the context of the graph split but a malformed query federating the same lvs might possibly starve if the server is busy, I'm not sure that we have to worry about this or not... A query federating itself does not make much sense... TASK DETAIL https://phabricator.wikimedia.org/T361950 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: EBernhardson, dcausse Cc: EBernhardson, Daniel_Mietchen, Aklapper, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365692 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Triggered a reindex of all the lexemes using https://gitlab.wikimedia.org/repos/search-platform/cirrus-rerender, might take about 3 hours to complete. TASK DETAIL https://phabricator.wikimedia.org/T365692 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse added a comment. The system should now index lexemes properly. We still have to reindex all the lexemes to fix the ones created/edited before the fix was applied. TASK DETAIL https://phabricator.wikimedia.org/T365692 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T365692 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse added a comment. The search fields specific to Lexemes are currently ignored causing this NOTICE but also preventing lexemes from being searchable (esp. the new ones). The schemas should be adapted to support these fields and the lexemes will have to be re-indexed. TASK DETAIL https://phabricator.wikimedia.org/T365692 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365684: Particular lexeme (L1326823) not indexed so search with the Wikidata API returns nothing
dcausse closed this task as a duplicate of T365692: PHP Notice: Undefined index: lexeme_language / lexical_category. TASK DETAIL https://phabricator.wikimedia.org/T365684 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Fnielsen, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse merged a task: T365684: Particular lexeme (L1326823) not indexed so search with the Wikidata API returns nothing. TASK DETAIL https://phabricator.wikimedia.org/T365692 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category
dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T365692 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse added a comment. 1. Runs hdfs-rsync directly from the blazegraph hosts - requires installing its dependencies - open a holes between blazegraph and the hadoop cluster 2. Schedule hdfs-rsync on a stat machine copying the ttl dumps from hdfs to `/srv/analytics-search/wikibase_processed_dumps/wikidata/$SNAPSHOT` - cons: consumes some space on a stat machine 3. Run hdfs-rsync on-demand to copy the ttl dump from hdfs to `/srv/analytics-search/wikibase_processed_dumps/temp` and cleanup this folder once done - cons: slows down a bit a process I was planning on doing option 3, any objections with this approach? TASK DETAIL https://phabricator.wikimedia.org/T349069 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse added a comment. Another approach could be to use the `/mnt/hdfs` mountpoint? I have been told that it might not be stable enough but perhaps it's OK for doing a copy? TASK DETAIL https://phabricator.wikimedia.org/T349069 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355298: Investigate the impact of the WDQS graph split on constraints checks
dcausse added a comment. Looking at the constraints I believe that 4 may use sparql: - FormatChecker.php - TypeChecker.php - UniqueValueChecker.php - ValueTypeChecker.php FormatChecker switched to using shellbox so I think can be ignored. TypeChecker & ValueTypeChecker are using Sparql to inspect the class hierarchy which may or may not be affected by the split. UniqueValueChecker is on the other hand most certainly affected by the split. TASK DETAIL https://phabricator.wikimedia.org/T355298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, dr0ptp4kt, Daniel_Mietchen, ItamarWMDE, Lucas_Werkmeister_WMDE, karapayneWMDE, Aklapper, Lydia_Pintscher, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, Invadibot, maantietaja, Akuckartz, Dringsim, Eihel, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, Esc3300, LawExplorer, _jensen, rosalieper, Agabi10, Scott_WUaS, abian, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T364077: Adapt the wdqs data-transfer cookbook to operate with federated subgraphs
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The current data-transfer cookbook does assume that a single graph is served from all wdqs nodes, this will no longer be the case when the graph will be split. Most of the script should operate similarly but there are few important configuration bits that might need to vary: Mainly the kafka topic the updater will consume from, it will vary depending on what subgraph the machine is serving. This information might be available from puppet and could possibly be exposed via some config file readable by the cookbook. The cookbook should also make sure to not transfer the data of subgraph A into a machine configured to serve the subgraph B. We should also explore and document a procedure to switch a machine that serves subgraph A to serve subgraph B. AC: - The wdqs data-transfer cookbook can operate in a federated subgraphs setup TASK DETAIL https://phabricator.wikimedia.org/T364077 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse added a comment. @BTullis @bking I plan to use a cookbook to transfer some data out of hdfs to blazegraph machines, a naive approach I thought about was to use a temp folder somewhere in `/srv` of a stat100x machine and then re-use the transferpy <https://gerrit.wikimedia.org/r/operations/software/transferpy> python module. The current dumps are about 200G, do you think that this option is viable? Can we use a folder in `/srv` as a temp folder for such transfers? This data is only useful for the transfer and should be deleted by the cookbook when it ends. TASK DETAIL https://phabricator.wikimedia.org/T349069 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T349069 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T349069 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T349069 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse moved this task from Ready for Dev -- SWE to Needs review on the Discovery-Search (Current work) board. dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T362508 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T362060 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362977: WDQS updater missed some updates
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362977 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362977: WDQS updater missed some updates
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Reported at https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#Stale_values_in_SparQL_query_result - Q968274 revision 2131311442 at `2024-04-17T13:18:54` - Q4314307 revision 2130626175 at `2024-04-16T13:20:18` - Q4349600 revision 2130628297 at `2024-04-16T13:23:52` - Q51670636 revision 2131311281 at `2024-04-17T13:18:30` None of these are found in the `event.mediawiki_revision_create` hive table. I can't find them in the `eqiad.mediawiki.revision-create` topic either - For Q968274 at 2131311442: the revision-create event failed to be emitted with `Unable to deliver all events: 503: Service Unavailable`: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.04.17?id=MjA17I4BWjhRzdxne8ai. I can't find traces of the other three but searching for "Unable to deliver all events: 503: Service Unavailable" in logstash I can huge spikes of failures (sometimes more than 20k in one hour): F47519706: image.png <https://phabricator.wikimedia.org/F47519706> It is possible that mediawiki or event-gate failed to properly submits these revision-create events. Related tasks - T249745 <https://phabricator.wikimedia.org/T249745>: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable" - T120242 <https://phabricator.wikimedia.org/T120242>: Eventually-Consistent MediaWiki state change events | MediaWiki events as source of truth TASK DETAIL https://phabricator.wikimedia.org/T362977 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The updater is misbehaving in codfw, apparently processing too many `reconciliations` which triggers a //slow// update mode and thus is not able to keep up with the update rate and causes maxlag to throttle bot edits in wikidata. TASK DETAIL https://phabricator.wikimedia.org/T362508 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361935 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, dr0ptp4kt, pfischer, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361935 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, dr0ptp4kt, pfischer, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362074: WDQS wikibase:around sometimes ignore exact matches
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION (originally reported https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#WDQS_wikibase:around_issue) It might happen that in some circumstances a `wikibase:around` is ignoring exact matches. For instance `Q5637175` has point equals to `Point(-2.5307 53.0268)` but searching for this exact same location the query service is unable to find it: SELECT DISTINCT ?item ?itemLabel ?location ?dist WHERE { SERVICE wikibase:around { ?item wdt:P625 ?location. bd:serviceParam wikibase:center "Point(-2.5307 53.0268)"^^geo:wktLiteral. bd:serviceParam wikibase:radius "0.1". bd:serviceParam wikibase:distance ?dist. } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } Varying the searched point a bit (e.g. `Point(-2.5307 53.02681)`) the point is found. It is unclear why this happens, might be some bug (edge cases) in how the search spaces is approximated with a surrounding box? TASK DETAIL https://phabricator.wikimedia.org/T362074 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS
dcausse added a subtask: T362060: Generalize ScholarlyArticleSplitter. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Daniel_Mietchen, Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T362060 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The spark job ScholarlyArticleSplitter should be generalized to support the general case with //n// subgraphs, a wider variety of rules and stubs. AC: - support subgraph definitions as proposed in T361935 <https://phabricator.wikimedia.org/T361935> - support stubs WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs <https://www.wikidata.org/wiki/User:DCausse_(WMF)/WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs> TASK DETAIL https://phabricator.wikimedia.org/T362060 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries
dcausse moved this task from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Two scholia queries were rewritten: - https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Property_paths - https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Number_of_articles_with_CiTO-annotated_citations_by_year The pages also contains some documentation about to approach such rewrites. I'm boldly moving this ticket to our Needs Reporting (prior to be closed) column as I believe further explorations about how to rewrite scholia queries to support the split could perhaps be better handled in https://github.com/WDscholia/scholia. But please free to re-open this ticket if you believe it has some value. TASK DETAIL https://phabricator.wikimedia.org/T349911 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Jane023, dr0ptp4kt, Fnielsen, Daniel_Mietchen, EgonWillighagen, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T361950 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS
dcausse added a subtask: T361950: Ensure that WDQS query throttling does not interfere with federation. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation
dcausse renamed this task from "Ensure that WDQS query throttling do not interfere with federation" to "Ensure that WDQS query throttling does not interfere with federation". TASK DETAIL https://phabricator.wikimedia.org/T361950 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling do not interfere with federation
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION When we exposed the 3 experimental endpoints to test the first version of the graph split we disabled query throttling to avoid impacting the various analysis we had to run to evaluate the impact of the split. We then realized while analyzing what happens when federated queries are running that this throttling mechanism might have a negative impact by having wdqs nodes throttling each others. This ticket is about finding a plan to ensure that query throttling does not interfere with federation. A simple approach would be that the wdqs machine receiving the traffic is going to be responsible for throttling the client, subsequent queries made internally as part of federation would be un-throttled. Nodes serving federated results to other nodes should still remain protected by the frontend node answering to the client. To achieve this we need to detect when a query is emitted from another query service and craft a header at the nginx level to inform the throttling servlet that it should not be activated. Such header exist but sadly the throttling filter does re-use the existing `X-BIGDATA-READ-ONLY` which is having another purpose so cannot be re-used in our context (it would be too dangerous). One approach could be to use a new header `X-Disable-Throttling` dedicated for this purpose the nginx settings would have to be adapted to set `X-Disable-Throttling` when the query is emitted from from another blazegraph node. Unfortunately this might start to throttle local requests made directly on the blazegraph port (updates) which would then be prone to throttling and would have to be adapted to set this header (streaming-updater-consumer, data import scripts). Another approach is to adapt the throttling servlet and change how it's configured adding a new config `disable-throttling-if-header` such that a request with: - `X-BIGDATA-READ-ONLY: 1` and `X-Disable-Throttling: true` would disable throttling - `X-BIGDATA-READ-ONLY: 1` only would enable throttling - a request without any these headers would not enable throttling AC: - decide on the approach - blazegraph does not throttle itself when running federated queries TASK DETAIL https://phabricator.wikimedia.org/T361950 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361935 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, pfischer, dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS
dcausse added a subtask: T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T361935 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, pfischer, dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION In order to support updating the subgraphs defined in Wikidata:SPARQL_query_service/WDQS_graph_split <https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split> the streaming updater must be adapted to produce the right mutations for a given subgraph. General Idea Right after fetching the entity content from `Special:EntityData` and before generating the diff a new component will be added to apply the set of rules defining the subgraph and will populate a stream per subgraph: - `rdf-streaming-updater.mutations` will remain and will contain all the mutations to update the existing setup - `rdf-streaming-updater.mutations-main` will be added and will contain mutations related to the `main` - `rdf-streaming-updater.mutations-scholarly` will be added and will contain mutations related to the `scholarly` The consumer component on the other end of the stream might require very few (no?) modifications and might just work by configuring the kafka topic it should consume. The ticket describes the general case to support an arbitrary number of subgraphs, there could be certainly a simpler way to handle this with the 2 subgraphs special case but it might error prone and less future proof so we should attempt to solve the general case. If difficulties are found while implementing it we can always reconsider and give up on the general case. Stubs - //Stubs// are artificial triples (explained in WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs <https://www.wikidata.org/wiki/User:DCausse_(WMF)/WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs>) that will take the following form: wd:Q42 wikibaseqs:subgraph wdsubgraph:main - `wikibaseqs` (with suggested IRI: `http://wikiba.se/queryservice#`) a new namespace to hold the vocabulary of things related to the wdqs codebase and will probably be hard-coded there, `subgraph` will be the first and only term required at the moment - `wdsubgraph` (with suggested IRI: `https://query.wikidata.org/subgraph/`) a new namespace to hold the IRIs identifying subgraphs in the scope wikidata query service. It will be setup in the `prefixes.json` config file. Rules - The rules should be extremely simple to apply and will require only the data available locally after fetch the entity content. How the rules are expressed is up for discussion but could be a simple yaml file: subgraphs: - scholarly: stream: "rdf-streaming-updater.mutations-scholarly" default: block rules: - "pass ?entity wdt:P31 wd:Q13442814" - "pass ?entity rdf:type wikibase:Property" stubs_source: true stubs_subgraph_uri: "https://query.wikidata.org/subgraph/scholarly"; - main: stream: "rdf-streaming-updater.mutations-main" default: pass rules: - "block ?entity wdt:P31 wd:Q13442814" stubs_source: true stubs_subgraph_uri: "https://query.wikidata.org/subgraph/main"; - full: stream: "rdf-streaming-udpater.mutations" default: pass stubs_source: false - rules are prefixed with `pass` or `block` telling what to do if evaluated to true - `?entity` will be replaced by the entity URI being updated - `[]` means any rdf literal The rules are applied in order and stop at the first match, if it's a `pass` the entity should enter this subraph if it's a block it should not pass. If no rule matches then entity uses the `default` setting to decide to either `pass` or `block`. The `stubs_source` attribute will determine if this graph is OK to be linked from a stub triple when an entity is blocked from another subgraph. Rules Outcome - Applying the rules will only answer the question: //does this entity revision belong to this graph?// The actual set of mutations to apply may also vary depending of the type of MutationOperation <https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/+/refs/heads/master/streaming-updater-producer/src/main/scala/org/wikidata/query/rdf/updater/MutationOperation.scala> to apply. **Diff** When a diff is required the rules will be applied from both the previous and next revision. - when an entity enters a subgraph: - Diff: to remove stubs - FullImport: the entity is fully imported - when an entity leaves a subgraph: - DeleteItem: the entity is fully delete - Diff: to add the stubs - when an entity stays in the subgraph - Diff: simple diff - when an entity stays outside the
[Wikidata-bugs] [Maniphest] T361114: Alert Search Platform and/or DPE SRE when Wikidata is lagged
dcausse added a comment. Thanks! I'm not very familiar with alerts being set from grafana neither, I'll try to get more info on this, worst case we can always set up a new one directly in alertmanager just for the wdqs lag and sent to the search team using the same formula used by updateQueryServiceLag.php. TASK DETAIL https://phabricator.wikimedia.org/T361114 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361114: Alert Search Platform and/or DPE SRE when Wikidata is lagged
dcausse removed dcausse as the assignee of this task. dcausse added a comment. @Lucas_Werkmeister_WMDE thanks! Do you know where we could update this to include our alert email for such alerts? TASK DETAIL https://phabricator.wikimedia.org/T361114 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement
dcausse moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Should be working properly now TASK DETAIL https://phabricator.wikimedia.org/T353683 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, Danny_Benjafield_WMDE, gonzalez.actor, S8321414, Astuthiodit_1, karapayneWMDE, toberto, Invadibot, maantietaja, Wilmanbeno, ItamarWMDE, Nintendofan885, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361106: Restore wdqs1013 with a data transfer
dcausse closed this task as "Declined". dcausse added a comment. won't be required after all TASK DETAIL https://phabricator.wikimedia.org/T361106 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking, dcausse Cc: dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse closed subtask T361106: Restore wdqs1013 with a data transfer as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361246 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361106: Restore wdqs1013 with a data transfer
dcausse moved this task from Backlog to Blocked / Waiting on the Data-Platform-SRE (2024.03.25 - 2024.04.14) board. dcausse added a comment. I restarted the updater on wdqs1013 and it's catching up, I have a note to check the status tomorrow and will repool it if necessary. TASK DETAIL https://phabricator.wikimedia.org/T361106 WORKBOARD https://phabricator.wikimedia.org/project/board/7054/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking, dcausse Cc: dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T361246 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled
dcausse added a project: Wikidata-Query-Service. TASK DETAIL https://phabricator.wikimedia.org/T361246 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse added a comment. I could re-enable puppet on wdqs1013 and restart the updater to catchup on updates. But apparently this machine was repooled yesterday (as part of the wdqs scap deploy I suppose) and thus started to serve stale data without triggering any maxlag. It's when re-enabling puppet that I realized that this node was still pooled so I depooled it immediately but this caused a maxlag for several minutes. Scap repooling machines might be something we might look into to avoid this kind of issues in the future. TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse added a comment. depooling the node we can see that the query rate actually going down to 0, request rate is generally very low on codfw so we might have to tune the threshold at around 0.2. F43663858: image.png <https://phabricator.wikimedia.org/F43663858> TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T336352: Update maxlag calculation maintenance script to reflect new prometheus queries
dcausse removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T336352 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, dcausse Cc: Lucas_Werkmeister_WMDE, Aklapper, ItamarWMDE, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, 786, Biggs657, Juan90264, Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan, Neuronton ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse added a comment. The approach taken is: - from nginx control a new header named 'x-monitoring-query' set to true if a list of criteria is met (currently using user-agent strings but could be extended to using source IPs as well I suppose) - from blazegraph, do not log query with the header `x-monitoring-query` set - adapt `Wikidata.org` to allow tuning the //minimal query rate// expected to be served from a pooled served (was hardcoded to 1.0) - change the systemd timer that runs `updateQueryServiceLag.php` to set `--pooled-server-min-query-rate` to 0.5 (will need to double check that this value is sane and works well for codfw and eqiad servers) TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse moved this task from Incoming to Needs review on the Discovery-Search (Current work) board. dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T360993 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse added a comment. Here are the UAs seen in hour of a depooled server: +--+-+ |UA|count| +--+-+ |check_http/v2.3.3 (monitoring-plugins 2.3.3) |87 | |Twisted PageGetter|2146 | |prometheus-public-sparql-ep-check |1913 | |wmf-prometheus/prometheus-blazegraph-exporter (r...@wikimedia.org)|120 | +--+-+ TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse triaged this task as "High" priority. dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse added a comment. Mitigation: - blazegraph stopped - updater stopped with the `/srv/wdqs/data_loaded` flag removed - puppet disabled TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Propagating the lag of a wdqs host should only be done if this host is ''pooled'' (actually serving user traffic). Determining the ''pooling'' status appeared to be quite challenging in our infra so in T336352 <https://phabricator.wikimedia.org/T336352> we started using a metric based on the query rate hoping that it would be a reasonably proxy for determining if the server is serving users or not. This worked well so far but a recent incident where a server was depooled after being stuck for some reasons showed that this metric based on query rate is too fragile: We consider a server to be pooled if its query rate is above 1 qps: `rate(org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries{}[10m]) > 1` Sadly this was not true on wdqs1013 when it was depooled, for some reasons its query rate was still above 1 (below 1.3). It is possible that this metric is polluted with monitoring queries that do not relate to serving user traffic. We should perhaps refine how we generate `org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries` and make sure we only measure user queries. AC: - wdqs lag propagation should no longer include false positives (count the lag of a server that is actually depooled) TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation
dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board. dcausse added a comment. draft page: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits TASK DETAIL https://phabricator.wikimedia.org/T357966 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T357966 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement
dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board. dcausse added a comment. changed the layout of the query a bit by moving the logistic function introduced in T271799 <https://phabricator.wikimedia.org/T271799> to the top-level so that it wraps the new nearmatch clause TASK DETAIL https://phabricator.wikimedia.org/T353683 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, gonzalez.actor, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, toberto, Invadibot, maantietaja, Wilmanbeno, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Nintendofan885, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits
dcausse claimed this task. dcausse moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Compiled 10 real world examples at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples TASK DETAIL https://phabricator.wikimedia.org/T357980 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse added a comment. final report available at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/WDQS_Graph_Split_Impact_Analysis TASK DETAIL https://phabricator.wikimedia.org/T355040 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356773: [tracking] Community feedback for the WDQS Split the Graph project
dcausse added a comment. @Physikerwelt thanks for your feedback. Blazegraph is definitely not the best solution and the work to move off of blazegraph should be tracked under https://phabricator.wikimedia.org/T330525 (see the initial exploration <https://www.wikidata.org/wiki/File:WDQS_Backend_Alternatives_working_paper.pdf> we have done). The solutions you suggest might be better discussed in their own tickets as a subtask of T335067 <https://phabricator.wikimedia.org/T335067>. This particular ticket is about collecting feedback regarding use-cases that might be affected by the split. This split is one of the solution we want to experiment to address the scalabity issues of WDQS <https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy>. We are conscious about the usability issues that you raise but at this point we are more focused on understanding the feasibility and limitations of federation with such a split. It should be worth noting that one goal is to be sure that use-cases not relying on the scientific articles should still work without federation. TASK DETAIL https://phabricator.wikimedia.org/T356773 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Sannita, dcausse Cc: Physikerwelt, EgonWillighagen, ArthurPSmith, Sj, dcausse, valerio.bozzolan, tfmorris, Gehel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356773: [tracking] Community feedback for the WDQS Split the Graph project
dcausse added a comment. In T356773#9531179 <https://phabricator.wikimedia.org/T356773#9531179>, @EgonWillighagen wrote: > I tried to get the federation working, but got time outs too. The problem is that the current setup makes splits at a statement level. That is, given statements with some property (e.g. P2860 <https://phabricator.wikimedia.org/P2860> and P1433 <https://phabricator.wikimedia.org/P1433>), some results are in one QS instance and some are in the other. That means a lot of federation-union combinations to get all results. I posted an example query that is affected (the first I tried) in this issue report: https://github.com/WDscholia/scholia/issues/2423 I got this query rewritten at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Number_of_articles_with_CiTO-annotated_citations_by_year, I agree that given the current split strategy we have to UNION the main and scholarly articles graph most of the time. TASK DETAIL https://phabricator.wikimedia.org/T356773 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Sannita, dcausse Cc: Physikerwelt, EgonWillighagen, ArthurPSmith, Sj, dcausse, valerio.bozzolan, tfmorris, Gehel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement
dcausse moved this task from To Be Deployed to In Progress on the Discovery-Search (Current work) board. dcausse added a comment. The new builder moved the result to #4 which is better but still not enough and it's beaten by 3 other images because other criteria: - weighted_tags:image.linked.from.wikipedia.lead_image/Q458 - statement_keywords:p180=q458 Moving back to in-progress to fine-tune the weight (probably bumping from 3.5 to 10). TASK DETAIL https://phabricator.wikimedia.org/T353683 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, Danny_Benjafield_WMDE, gonzalez.actor, Astuthiodit_1, karapayneWMDE, toberto, Invadibot, maantietaja, Wilmanbeno, CBogen, ItamarWMDE, Nintendofan885, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits
dcausse added a comment. WIP at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples TASK DETAIL https://phabricator.wikimedia.org/T357980 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS
dcausse added a subtask: T357980: Compile a set of queries rewritten with federation across the two graph splits. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T357980 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits
dcausse renamed this task from "Compile a set of queries rewritten with federation accross the two graph splits" to "Compile a set of queries rewritten with federation across the two graph splits". TASK DETAIL https://phabricator.wikimedia.org/T357980 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation accross the two graph splits
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Having a set of examples might be helpful for users experimenting with the graph split. A subpage under https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split might be appropriate. The set of queries to rewrite could be sourced from the samples used in T355040 <https://phabricator.wikimedia.org/T355040>. An example should consist of a query that requires scholarly articles and its rewritten form. Ideally the results should yield identical results when applied to the global graph and when applied to the splits. AC: - a new subpage of https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split is available with several (between 5 and 10?) examples queries federating `query-main-experimental.wikidata.org` and `query-scholarly-experimental.wikidata.org`. TASK DETAIL https://phabricator.wikimedia.org/T357980 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS
dcausse added a subtask: T357966: Document limitations of blazegraph federation. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T357966 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Writing a query that federates multiple SPARQL endpoints can be challenging if the intermediate results that have to be shared are big. Better understanding and documenting such limitations might help users writing such queries: - federating wdqs from a wcqs query - rewriting wdqs queries with federation in the scope of the graph split experiment AC: - documentation of the limits added in https://www.mediawiki.org/wiki/Wikidata_Query_Service (or a sub-page) TASK DETAIL https://phabricator.wikimedia.org/T357966 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board. dcausse added a comment. Draft report up at https://wikitech.wikimedia.org/wiki/User:DCausse/WDQS_Graph_Split_Impact_Analysis TASK DETAIL https://phabricator.wikimedia.org/T355040 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T353453: [Analytics] Impact of Scholia on WDQS
dcausse added a comment. In T353453#9524925 <https://phabricator.wikimedia.org/T353453#9524925>, @AndrewTavis_WMDE wrote: > Quick note on this: > > There are two ways that need to be factored in to deriving if a query is from Scholia. Some queries do start with `#tool: scholia` as @dcausse suggested, but I checked for user agents and also found that the string `"Scholia"` is also used as a user agent. Big thing is that some of the queries have the comment and some have the user agent, but in no cases do we have both. Indeed I saw these two as well, I'm not sure how to interpret this yet but it could be that some are coming from web browsers browsing https://scholia.toolforge.org/ (`#tool: scholia` in the query) and the "Scholia" user-agent might be from some automated tooling used by scholia that we have yet to discover? Looking at the queries might help. Regarding `#tool: scholia` something I noted is a non negligible portion of the traffic is coming from automated web crawlers, this might be interesting to identify and distinguish. TASK DETAIL https://phabricator.wikimedia.org/T353453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AndrewTavis_WMDE, dcausse Cc: Lydia_Pintscher, dcausse, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse added a comment. WIP: - included the new 100k queries sample named `QUERY-Q4` from T349512 <https://phabricator.wikimedia.org/T349512> (random sample that is representative of the query length and runtime) - the % of affected queries (deduplicated) per tool is (//sample// being the `QUERY-Q4` sample mentionned above) F41752511: image.png <https://phabricator.wikimedia.org/F41752511> The above graph should be taken with a grain of salt as the number of queries per datapoints varies a lot (86 queries for //Listeria// vs 85k for //random//), these numbers are being reviewed so no conclusions should be drawn yet but it does not seem that we obtain the same numbers that were found originally in Wikidata_Subgraph_Query_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis#Query_count_and_time> where 2.5% of the total query count are being identified as requiring scholarly articles. A more qualitative analysis is in progress: - analyze of the user agents to understand what usecases are mainly affected, preliminary results show that for instance a single UA is the cause of 50% of the affected queries - extract some SPARQL queries to start evaluating how federation could be applied/tested TASK DETAIL https://phabricator.wikimedia.org/T355040 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs
dcausse added a comment. @dr0ptp4kt thanks! is the difference in the number of successful queries only explained by the improvement in query time or are there some improvements in the number of queries that timeout as well? TASK DETAIL https://phabricator.wikimedia.org/T355037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, dcausse Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T355888 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, BTullis, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356243: process_sparql_query_hourly sometimes fails on the jena sparql parser
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Failure seen while `org.wikidata.query.rdf.spark.transform.queries.sparql.QueryExtractor` was processing the dataset `event.wdqs_external_sparql_query/year=2024/month=1/day=30/hour=9`. Last 4096 bytes of stderr : riterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302) at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582) at org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134) The logs are truncated but it could possibly be a recursion issue failing with a `StackOverflow`. TASK DETAIL https://phabricator.wikimedia.org/T356243 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities
dcausse added a comment. Scanning dumps from 2024/01/21 we can find 1623 duplicated statement ids (full list here: https://people.wikimedia.org/~dcausse/T356161_sdc_duplicated_statement_ids.csv) TASK DETAIL https://phabricator.wikimedia.org/T356161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities
dcausse renamed this task from "WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities" to "WikibaseMediaInfo seems to reuse statement identifiers from other entities". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities
dcausse added a comment. @Lucas_Werkmeister_WMDE thanks for all the context! I get that it only affects WikibaseMediaInfo. Can we exclude Wikibase as a culprit possibly affecting wikidata or should we run a quick investigation to find possible duplicated statement identifiers in the wikidata RDF dumps? TASK DETAIL https://phabricator.wikimedia.org/T356161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T356161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities
dcausse created this task. dcausse added projects: WikibaseMediaInfo, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Structured-Data-Backlog. TASK DESCRIPTION Seen on M130887689 <https://commons.wikimedia.org/wiki/Special:EntityData/M130887689.ttl?flavor=dump> and M115086921 <https://commons.wikimedia.org/wiki/Special:EntityData/M115086921.json> the content of the wikibase entity is almost identical. The statement ids are the same which is highly problematic for the Wikibase RDF representation which assumes that a statement id is unique and belong to a single entity. E.g. `M130887689$83501cde-4a4b-a7d0-9832-5f1982be0c41` is referenced by both M130887689 & M115086921. I'm not sure what actions have led to this situation but this should definitely be fixed to make sure that the statement ids are not shared. AC: - identify what action caused an entity to re-use statement ids - determine if this problem affects Wikibase itself and wikidata - fix this behavior - cleanup existing entities that have non unique statement ids TASK DETAIL https://phabricator.wikimedia.org/T356161 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, toberto, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse added a comment. WIP: https://people.wikimedia.org/~dcausse/T355040_EARLY_DRAFT_wdqs_query_results_analysis.html (UA redacted for now) TL/DR: - added support for identifying true positives (queries with a scientific article in the sparql query or in the results) - MixNMatch has a very high number of true positives, thus need more qualitative analysis (ticket TBD) - Listeria does not have any true positives but shows bad outcome (81% identical in the best case, 68% worst case), needs more qualitative analysis too TASK DETAIL https://phabricator.wikimedia.org/T355040 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T351650: Expose 3 new dedicated WDQS endpoints
dcausse added a subtask: T355888: Enable cross federation between experimental WDQS endpoints. TASK DETAIL https://phabricator.wikimedia.org/T351650 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper, dcausse Cc: Gehel, bking, dcausse, dr0ptp4kt, RKemper, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints
dcausse added a parent task: T351650: Expose 3 new dedicated WDQS endpoints. TASK DETAIL https://phabricator.wikimedia.org/T355888 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dcausse, Aklapper, AWesterinen, BTullis, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints
dcausse created this task. dcausse added projects: Data-Platform-SRE, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Experimental endpoints `query-main-experimental` and `query-scholarly-experimental` must allow cross federation. A simple way to achieve this might be to allow these 3 experimental endpoints to be part of the `allowlist` stored in puppet, it might enable unnecessary federation between production servers and the experimental ones (not ideal but probably acceptable?). Ultimately the following queries must be working after allowing such federation: From https://query-scholarly-experimental.wikidata.org the query: # all papers by ISNI 0001 2124 7940 (Carlo Rovelli) SELECT ?article ?articleLabel { ?author wdt:P213 " 0001 2124 7940" SERVICE <https://query-main-experimental.wikidata.org/sparql> { # Querying the scholarly article split ?article wdt:P50 ?author ; wdt:P31 wd:Q13442814 . BIND(?articleLabel as ?articleLabel) . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } } And from https://query-main-experimental.wikidata.org/ the query: # all papers by ISNI 0001 2124 7940 (Carlo Rovelli) SELECT ?article ?articleLabel { SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> { # Querying the wikidata main graph split ?author wdt:P213 " 0001 2124 7940" } hint:Prior hint:runFirst true . # Tell blazegraph to first collect ?author ?article wdt:P50 ?author ; wdt:P31 wd:Q13442814 . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } Should work. AC: - federation works between query-main-experimental and query-scholarly-experimental - the 2 test queries work TASK DETAIL https://phabricator.wikimedia.org/T355888 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: RKemper, dcausse, Aklapper, AWesterinen, BTullis, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse added a comment. Quick report on the progress being made: - Our query logs do not only contains sparql queries and the sparql client used to collect the data has to be adapted to support these (ASK, CONSTRUCT, DESCRIBE) (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/991622) - Getting failures due to response size, bumped the limit to 16M but still getting problems, I might stop here and simply tag & ignore such massive queries moving forward - Getting very bad numbers from Listeria and MixNMatch (34% and 17% identical respectively), avg result size is 1.6k and 8k so might explain partly why getting identical results is difficult, need more investigations to understand the cause... - Getting pretty mediocre numbers for WikidataIntegrator at 88% with very small avg result size at 8, more investigation needed - Pywikibot and SPARQLWrapper are good at 99.4% for both TASK DETAIL https://phabricator.wikimedia.org/T355040 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T353683 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, Danny_Benjafield_WMDE, gonzalez.actor, Astuthiodit_1, karapayneWMDE, toberto, Invadibot, maantietaja, Wilmanbeno, CBogen, ItamarWMDE, Nintendofan885, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION By using a tool to compare the differences of two results of the same sparql query we should evaluate how many queries might "break" when running against the wikidata main graph instead of the full graph. Comparison will use T351819 <https://phabricator.wikimedia.org/T351819> and be based on the sets of sparql extracted in T349512 <https://phabricator.wikimedia.org/T349512>. We should attempt to identify the reasons of the differences and whether they are related or unrelated to the split: - query features dependent on internal ordering the blazegraph btrees (LIMIT X OFFSET Y, bd:slice) - use of external datasets (federation, mwapi) - unicode collation issues (T233204 <https://phabricator.wikimedia.org/T233204>) - ...add more when discovered For the queries whose results vary because of the split we should attempt to evaluate if targeting scholarly articles is intentional or not (e.g. statistical queries with group by counts) and possibly identify the tools and their maintainers to contact them to gather feedback on the project. AC: - a report is available showing how the current split is going to affect queries once run on the wikidata main subgraph - a list of affected tools/scripts (when identifiable) that could possibly be contacted TASK DETAIL https://phabricator.wikimedia.org/T355040 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T352538: [EPIC] Evaluate the impact of the graph split
dcausse added a subtask: T355037: Compare the performance of sparql queries between the full graph and the subgraphs. TASK DETAIL https://phabricator.wikimedia.org/T352538 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Gehel, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs
dcausse added a parent task: T352538: [EPIC] Evaluate the impact of the graph split. TASK DETAIL https://phabricator.wikimedia.org/T355037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs
dcausse renamed this task from "Com" to "Compare the performance of sparql queries between the full graph and the subgraphs". dcausse added a project: Wikidata-Query-Service. dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T355037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org