[Wikidata-bugs] [Maniphest] T214378: Check simple format constraints (no grouping) in PHP instead of SPARQL

2024-06-14 Thread dcausse
dcausse added a comment.


  Unsure if feasible but perhaps manually flagging list of safe regex &  very 
popular regex <https://w.wiki/APPB> could help reduce the number of requests to 
shellbox?

TASK DETAIL
  https://phabricator.wikimedia.org/T214378

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, akosiaris, Michael, sbassett, RazShuty, JBennett, Ladsgroup, 
Aklapper, Lucas_Werkmeister_WMDE, Ullasoff, Danny_Benjafield_WMDE, S8321414, 
Cleo_Lemoisson, Astuthiodit_1, karapayneWMDE, Invadibot, Devnull, maantietaja, 
ItamarWMDE, Akuckartz, Dringsim, Eihel, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, KimKelting, Esc3300, LawExplorer, _jensen, rosalieper, Agabi10, 
Scott_WUaS, Wong128hk, Luke081515, abian, Wikidata-bugs, aude, Bawolff, 
Lydia_Pintscher, Grunny, Mbch331, Jay8g, Krenair
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T367510: Request permission to create 4 kafka topics in kafka-main (WDQS graph split)

2024-06-14 Thread dcausse
dcausse renamed this task from "Request permission to create 4 kafka topics in 
kafka-main" to "Request permission to create 4 kafka topics in kafka-main (WDQS 
graph split)".

TASK DETAIL
  https://phabricator.wikimedia.org/T367510

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Kappakayala, S8321414, 
Clement_Goubert, Astuthiodit_1, AWesterinen, Arnoldokoth, BTullis, 
karapayneWMDE, Invadibot, maantietaja, wkandek, JMeybohm, ItamarWMDE, 
Akuckartz, Dringsim, Nandana, Namenlos314, jijiki, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T367510: Request permission to create 4 kafka topics in kafka-main

2024-06-14 Thread dcausse
dcausse created this task.
dcausse added projects: Wikidata, Wikidata-Query-Service, serviceops, 
Data-Platform-SRE.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: wmde-wikidata-tech.

TASK DESCRIPTION
  As part of the work to split the WDQS graph we will need to populate 4 new 
topics:
  
  - eqiad.rdf-streaming-updater.mutation-main
  - codfw.rdf-streaming-updater.mutation-main
  - eqiad.rdf-streaming-updater.mutation-scholarly
  - codfw.rdf-streaming-updater.mutation-scholarly
  
  The expected size of both added should not exceed the size of 
`eqiad.rdf-streaming-updater.mutation` which is around 17Gb (51Gb 
<https://grafana-rw.wikimedia.org/d/00234/kafka-by-topic?orgId=1&refresh=5m&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=main-eqiad&var-kafka_broker=All&var-topic=eqiad.rdf-streaming-updater.mutation&var-topic=codfw.rdf-streaming-updater.mutation&from=now-7d&to=now>
 including replication).
  Similarly for the rate of messages.
  
  Because topic mirroring this means that an additional 100Gb per cluster is 
required (+100Gb on main-eqiad and +100Gb on main-codfw).
  
  These topics must have the following characteristics:
  
  - a single partition
  - retention of 4 weeks
  
  AC:
  
  [ ] get sign off from #serviceops 
<https://phabricator.wikimedia.org/tag/serviceops/>
  [ ] topics are created in both clusters with proper settings

TASK DETAIL
  https://phabricator.wikimedia.org/T367510

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Kappakayala, S8321414, 
Clement_Goubert, Astuthiodit_1, AWesterinen, Arnoldokoth, BTullis, 
karapayneWMDE, Invadibot, maantietaja, wkandek, JMeybohm, ItamarWMDE, 
Akuckartz, Dringsim, Nandana, Namenlos314, jijiki, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation

2024-06-14 Thread dcausse
dcausse added a comment.


  I did some testing and sadly when a wdqs node makes a query to 
https://query.wikidata.org it hits varnish again:
  from wdqs1020 to https://query.wikidata.org (`echo 'SELECT ?test_dcausse {  
?test_dcausse ?p ?o .  } LIMIT 1' | curl -f -s --data-urlencode query@- 
https://query.wikidata.org/sparql?format=json`)
  
"x-request-id": "b34bb930-ef85-4b23-956e-7dcb11f0f7ec",
"content-length": "99",
"x-forwarded-proto": "http",
"x-client-port": "40256",
"x-bigdata-max-query-millis": "6",
"x-wmf-nocookies": "1",
"x-client-ip": "2620:0:861:10a:10:64:131:24",
"x-varnish": "800949377",
"x-forwarded-for": "2620:0:861:10a:10:64:131:24\\, 10.64.0.79\\, 
2620:0:861:10a:10:64:131:24",
"x-requestctl": "",
"x-cdis": "pass",
"accept": "*/*",
"x-real-ip": "2620:0:861:10a:10:64:131:24",
"via-nginx": "1",
"x-bigdata-read-only": "yes",
"host": "query.wikidata.org",
"content-type": "application/x-www-form-urlencoded",
"connection": "close",
"x-envoy-expected-rq-timeout-ms": "65000",
"x-connection-properties": "H2=1; SSR=0; SSL=TLSv1.3; 
C=TLS_AES_256_GCM_SHA384; EC=UNKNOWN;",
"user-agent": "curl/7.74.0"
  
  which is very similar to when querying from outside the network
  
"x-request-id": "3380f86f-99bc-4f0a-ac74-48e60317836d",
"content-length": "85",
"x-forwarded-proto": "http",
"x-client-port": "55334",
"x-bigdata-max-query-millis": "6",
"x-wmf-nocookies": "1",
"x-client-ip": "redacted",
"x-varnish": "512603614",
"x-forwarded-for": "redacted\\, 10.136.1.11\\, 2620:0:861:10e:10:64:135:23",
"x-requestctl": "",
"x-cdis": "pass",
"accept": "*/*",
"x-real-ip": "2620:0:861:10e:10:64:135:23",
"via-nginx": "1",
"x-bigdata-read-only": "yes",
"host": "query.wikidata.org",
"content-type": "application/x-www-form-urlencoded",
"connection": "close",
"x-envoy-expected-rq-timeout-ms": "65000",
"x-connection-properties": "H2=1; SSR=0; SSL=TLSv1.3; 
C=TLS_AES_256_GCM_SHA384; EC=UNKNOWN;",
"user-agent": "curl/7.81.0"
  
  If querying lvs via wdqs.discovery.wmnet directly we might have what we'd 
need (`echo 'SELECT ?lvs_eqiad_test_dcausse {?lvs_eqiad_test_dcausse ?p ?o .}  
LIMIT 1' | curl -v -f -s --data-urlencode query@- 
https://wdqs.discovery.wmnet/sparql?format=json`)
  
"x-real-ip": "2620:0:861:10a:10:64:131:24",
"x-request-id": "ef9b0e66-3b6f-48ae-a36f-cb1e67f93950",
"content-length": "110",
"x-forwarded-proto": "http",
"x-bigdata-read-only": "yes",
"host": "wdqs.discovery.wmnet",
"x-bigdata-max-query-millis": "6",
"content-type": "application/x-www-form-urlencoded",
"connection": "close",
"x-envoy-expected-rq-timeout-ms": "65000",
"x-forwarded-for": "2620:0:861:10a:10:64:131:24",
"user-agent": "curl/7.74.0",
"accept": "*/*"
  
  Hitting lvs might require a mapping like `https://query-main.wikidata.org` -> 
`https://wdqs-main.discovery.wmnet`, which I believe could be possible using 
`ServiceRegistry#addAlias( "https://wdqs-main.discovery.wmnet/sparql";, 
"https://query-main.wikidata.org/sparql";)`.
  This could done by adapting the syntax of the allow-list to enable setting 
aliases:
  `service_url[,list of aliases]` e.g. 
`https://wdqs-main.discovery.wmnet/sparql,https://query-main.wikidata.org/sparql`.
 The `WikibaseContextListener#loadAllowlist` could be adapted to support this 
syntax and and call `addAlias()` on the service registry.
  
  Additionally we probably want to exclude `*.wmnet` hosts found in the allow 
list from `org.wikidata.query.rdf.blazegraph.ProxiedHttpConnectionFactory`.
  
  Drawback is that hitting lvs from within the same lvs will hit localhost, 
this is not a problem because the lvs endpoint should be different in the 
context of the graph split but a malformed query federating the same lvs might 
possibly starve if the server is busy, I'm not sure that we have to worry about 
this or not... A query federating itself does not make much sense...

TASK DETAIL
  https://phabricator.wikimedia.org/T361950

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson, dcausse
Cc: EBernhardson, Daniel_Mietchen, Aklapper, dcausse, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-06-11 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, 
Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-06-11 Thread dcausse
dcausse moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  Triggered a reindex of all the lexemes using 
https://gitlab.wikimedia.org/repos/search-platform/cirrus-rerender, might take 
about 3 hours to complete.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, 
Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-05-29 Thread dcausse
dcausse added a comment.


  The system should now index lexemes properly.
  We still have to reindex all the lexemes to fix the ones created/edited 
before the fix was applied.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, 
darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-05-29 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, 
darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-05-28 Thread dcausse
dcausse added a comment.


  The search fields specific to Lexemes are currently ignored causing this 
NOTICE but also preventing lexemes from being searchable (esp. the new ones).
  The schemas should be adapted to support these fields and the lexemes will 
have to be re-indexed.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, 
darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365684: Particular lexeme (L1326823) not indexed so search with the Wikidata API returns nothing

2024-05-28 Thread dcausse
dcausse closed this task as a duplicate of T365692: PHP Notice: Undefined 
index: lexeme_language / lexical_category.

TASK DETAIL
  https://phabricator.wikimedia.org/T365684

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, Fnielsen, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-05-28 Thread dcausse
dcausse merged a task: T365684: Particular lexeme (L1326823) not indexed so 
search with the Wikidata API returns nothing.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, 
darthmon_wmde, Rosalie_WMDE, Kent7301, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
Mahir256, QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T365692: PHP Notice: Undefined index: lexeme_language / lexical_category

2024-05-28 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Incoming to In Progress on the Discovery-Search 
(Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T365692

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Fnielsen, Lucas_Werkmeister_WMDE, Aklapper, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, darthmon_wmde, Rosalie_WMDE, Nandana, Lahi, Gq86, 
GoranSMilovanovic, Mahir256, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, 
Jay8g
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-05-06 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Dringsim, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread dcausse
dcausse added a comment.


  1. Runs hdfs-rsync directly from the blazegraph hosts
- requires installing its dependencies
- open a holes between blazegraph and the hadoop cluster
  2. Schedule hdfs-rsync on a stat machine copying the ttl dumps from hdfs to 
`/srv/analytics-search/wikibase_processed_dumps/wikidata/$SNAPSHOT`
- cons: consumes some space on a stat machine
  3. Run hdfs-rsync on-demand to copy the ttl dump from hdfs to 
`/srv/analytics-search/wikibase_processed_dumps/temp` and cleanup this folder 
once done
- cons: slows down a bit a process
  
  I was planning on doing option 3, any objections with this approach?

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-06 Thread dcausse
dcausse added a comment.


  Another approach could be to use the `/mnt/hdfs` mountpoint? I have been told 
that it might not be stable enough but perhaps it's OK for doing a copy?

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355298: Investigate the impact of the WDQS graph split on constraints checks

2024-05-03 Thread dcausse
dcausse added a comment.


  Looking at the constraints I believe that 4 may use sparql:
  
  - FormatChecker.php
  - TypeChecker.php
  - UniqueValueChecker.php
  - ValueTypeChecker.php
  
  FormatChecker switched to using shellbox so I think can be ignored.
  
  TypeChecker & ValueTypeChecker are using Sparql to inspect the class 
hierarchy which may or may not be affected by the split.
  UniqueValueChecker is on the other hand most certainly affected by the split.

TASK DETAIL
  https://phabricator.wikimedia.org/T355298

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, dr0ptp4kt, Daniel_Mietchen, ItamarWMDE, Lucas_Werkmeister_WMDE, 
karapayneWMDE, Aklapper, Lydia_Pintscher, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, Invadibot, maantietaja, Akuckartz, Dringsim, Eihel, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, Esc3300, 
LawExplorer, _jensen, rosalieper, Agabi10, Scott_WUaS, abian, Wikidata-bugs, 
aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T364077: Adapt the wdqs data-transfer cookbook to operate with federated subgraphs

2024-05-03 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  The current data-transfer cookbook does assume that a single graph is served 
from all wdqs nodes, this will no longer be the case when the graph will be 
split.
  Most of the script should operate similarly but there are few important 
configuration bits that might need to vary:
  
  Mainly the kafka topic the updater will consume from, it will vary depending 
on what subgraph the machine is serving. This information might be available 
from puppet and could possibly be exposed via some config file readable by the 
cookbook.
  The cookbook should also make sure to not transfer the data of subgraph A 
into a machine configured to serve the subgraph B.
  We should also explore and document a procedure to switch a machine that 
serves subgraph A to serve subgraph B.
  
  AC:
  
  - The wdqs data-transfer cookbook can operate in a federated subgraphs setup

TASK DETAIL
  https://phabricator.wikimedia.org/T364077

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-05-02 Thread dcausse
dcausse added a comment.


  @BTullis @bking I plan to use a cookbook to transfer some data out of hdfs to 
blazegraph machines, a naive approach I thought about was to use a temp folder 
somewhere in `/srv` of a stat100x machine and then re-use the transferpy 
<https://gerrit.wikimedia.org/r/operations/software/transferpy> python module.
  The current dumps are about 200G, do you think that this option is viable? 
Can we use a folder in `/srv` as a temp folder for such transfers? This data is 
only useful for the transfer and should be deleted by the cookbook when it ends.

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-30 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Dringsim, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-04-30 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-04-30 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Incoming to In Progress on the Discovery-Search 
(Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2024-04-30 Thread dcausse
dcausse added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T349069

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, JAllemandou, dr0ptp4kt, bking, BTullis, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Dringsim, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-24 Thread dcausse
dcausse moved this task from Ready for Dev -- SWE to Needs review on the 
Discovery-Search (Current work) board.
dcausse claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Dringsim, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter

2024-04-23 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Ready for Dev -- SWE to In Progress on the 
Discovery-Search (Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T362060

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Dringsim, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, 
Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362977: WDQS updater missed some updates

2024-04-19 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Dringsim, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362977: WDQS updater missed some updates

2024-04-19 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Reported at 
https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#Stale_values_in_SparQL_query_result
  
  - Q968274 revision 2131311442 at `2024-04-17T13:18:54‎`
  - Q4314307 revision 2130626175 at `2024-04-16T13:20:18`
  - Q4349600 revision 2130628297 at `2024-04-16T13:23:52`
  - Q51670636 revision 2131311281 at `2024-04-17T13:18:30`
  
  None of these are found in the `event.mediawiki_revision_create` hive table.
  I can't find them in the `eqiad.mediawiki.revision-create` topic either
  
  - For Q968274 at 2131311442: the revision-create event  failed to be emitted 
with `Unable to deliver all events: 503: Service Unavailable`: 
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.04.17?id=MjA17I4BWjhRzdxne8ai.
  
  I can't find traces of the other three but searching for "Unable to deliver 
all events: 503: Service Unavailable" in logstash I can huge spikes of failures 
(sometimes more than 20k in one hour):
  F47519706: image.png <https://phabricator.wikimedia.org/F47519706>
  
  It is possible that mediawiki or event-gate failed to properly submits these 
revision-create events.
  Related tasks
  
  - T249745 <https://phabricator.wikimedia.org/T249745>: Could not enqueue 
jobs: "Unable to deliver all events: 503: Service Unavailable"
  - T120242 <https://phabricator.wikimedia.org/T120242>: Eventually-Consistent 
MediaWiki state change events | MediaWiki events as source of truth

TASK DETAIL
  https://phabricator.wikimedia.org/T362977

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-16 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-16 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-16 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, bking, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Isabelladantes1983, Themindcoder, Adamm71, S8321414, Jersione, Hellket777, 
LisafBia6531, Astuthiodit_1, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, 
KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362508: WDQS updater misbehaving in codfw

2024-04-15 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  The updater is misbehaving in codfw, apparently processing too many 
`reconciliations` which triggers a //slow// update mode and thus is not able to 
keep up with the update rate and causes maxlag to throttle bot edits in 
wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T362508

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs

2024-04-09 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T361935

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, dr0ptp4kt, pfischer, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, 
Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs

2024-04-09 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T361935

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, dr0ptp4kt, pfischer, dcausse, Aklapper, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, 
Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362074: WDQS wikibase:around sometimes ignore exact matches

2024-04-08 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  (originally reported 
https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#WDQS_wikibase:around_issue)
  
  It might happen that in some circumstances a `wikibase:around` is ignoring 
exact matches.
  For instance `Q5637175` has point equals to `Point(-2.5307 53.0268)` but 
searching for this exact same location the query service is unable to find it:
  
SELECT DISTINCT ?item ?itemLabel ?location ?dist WHERE 
{
  SERVICE wikibase:around {
?item wdt:P625 ?location.
bd:serviceParam wikibase:center "Point(-2.5307 
53.0268)"^^geo:wktLiteral.
bd:serviceParam wikibase:radius "0.1".
bd:serviceParam wikibase:distance ?dist.
 }
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
  
  Varying the searched point a bit (e.g. `Point(-2.5307 53.02681)`) the point 
is found.
  
  It is unclear why this happens, might be some bug (edge cases) in how the 
search spaces is approximated with a surrounding box?

TASK DETAIL
  https://phabricator.wikimedia.org/T362074

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2024-04-08 Thread dcausse
dcausse added a subtask: T362060: Generalize ScholarlyArticleSplitter.

TASK DETAIL
  https://phabricator.wikimedia.org/T337013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Daniel_Mietchen, Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, 
elal, karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, 
maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter

2024-04-08 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T362060

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T362060: Generalize ScholarlyArticleSplitter

2024-04-08 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  The spark job ScholarlyArticleSplitter should be generalized to support the 
general case with //n// subgraphs, a wider variety of rules and stubs.
  
  AC:
  
  - support subgraph definitions as proposed in T361935 
<https://phabricator.wikimedia.org/T361935>
  - support stubs 
WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs 
<https://www.wikidata.org/wiki/User:DCausse_(WMF)/WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs>

TASK DETAIL
  https://phabricator.wikimedia.org/T362060

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

2024-04-05 Thread dcausse
dcausse moved this task from Blocked/Waiting to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  Two scholia queries were rewritten:
  
  - 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Property_paths
  - 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Number_of_articles_with_CiTO-annotated_citations_by_year
  
  The pages also contains some documentation about to approach such rewrites.
  I'm boldly moving this ticket to our Needs Reporting (prior to be closed) 
column as I believe further explorations about how to rewrite scholia queries 
to support the split could perhaps be better handled in 
https://github.com/WDscholia/scholia.
  
  But please free to re-open this ticket if you believe it has some value.

TASK DETAIL
  https://phabricator.wikimedia.org/T349911

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Jane023, dr0ptp4kt, Fnielsen, Daniel_Mietchen, EgonWillighagen, dcausse, 
Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation

2024-04-05 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T361950

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2024-04-05 Thread dcausse
dcausse added a subtask: T361950: Ensure that WDQS query throttling does not 
interfere with federation.

TASK DETAIL
  https://phabricator.wikimedia.org/T337013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, 
karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, 
Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling does not interfere with federation

2024-04-05 Thread dcausse
dcausse renamed this task from "Ensure that WDQS query throttling do not 
interfere with federation" to "Ensure that WDQS query throttling does not 
interfere with federation".

TASK DETAIL
  https://phabricator.wikimedia.org/T361950

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361950: Ensure that WDQS query throttling do not interfere with federation

2024-04-05 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  When we exposed the 3 experimental endpoints to test the first version of the 
graph split we disabled query throttling to avoid impacting the various 
analysis we had to run to evaluate the impact of the split.
  We then realized while analyzing what happens when federated queries are 
running that this throttling mechanism might have a negative impact by having 
wdqs nodes throttling each others.
  
  This ticket is about finding a plan to ensure that query throttling does not 
interfere with federation.
  
  A simple approach would be that the wdqs machine receiving the traffic is 
going to be responsible for throttling the client, subsequent queries made 
internally as part of federation would be un-throttled. Nodes serving federated 
results to other nodes should still remain protected by the frontend node 
answering to the client.
  
  To achieve this we need to detect when a query is emitted from another query 
service and craft a header at the nginx level to inform the throttling servlet 
that it should not be activated.
  Such header exist but sadly the throttling filter does re-use the existing 
`X-BIGDATA-READ-ONLY` which is having another purpose so cannot be re-used in 
our context (it would be too dangerous).
  
  One approach could be to use a new header `X-Disable-Throttling` dedicated 
for this purpose the nginx settings would have to be adapted to set 
`X-Disable-Throttling` when the query is emitted from from another blazegraph 
node. Unfortunately this might start to throttle local requests made directly 
on the blazegraph port (updates) which would then be prone to throttling and 
would have to be adapted to set this header (streaming-updater-consumer, data 
import scripts).
  
  Another approach is to adapt the throttling servlet and change how it's 
configured adding a new config `disable-throttling-if-header` such that a 
request with:
  
  - `X-BIGDATA-READ-ONLY: 1` and `X-Disable-Throttling: true` would disable 
throttling
  - `X-BIGDATA-READ-ONLY: 1` only would enable throttling
  - a request without any these headers would not enable throttling
  
  AC:
  
  - decide on the approach
  - blazegraph does not throttle itself when running federated queries

TASK DETAIL
  https://phabricator.wikimedia.org/T361950

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs

2024-04-05 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T361935

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, pfischer, dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2024-04-05 Thread dcausse
dcausse added a subtask: T361935: Adapt the WDQS Streaming Updater to update 
multiple WDQS subgraphs.

TASK DETAIL
  https://phabricator.wikimedia.org/T337013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Kanashimi, SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, 
karapayneWMDE, Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, 
Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs

2024-04-05 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T361935

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dr0ptp4kt, pfischer, dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361935: Adapt the WDQS Streaming Updater to update multiple WDQS subgraphs

2024-04-05 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  In order to support updating the subgraphs defined in 
Wikidata:SPARQL_query_service/WDQS_graph_split 
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split> 
the streaming updater must be adapted to produce the right mutations for a 
given subgraph.
  
  General Idea
  
  
  Right after fetching the entity content from `Special:EntityData` and before 
generating the diff a new component will be added to apply the set of rules 
defining the subgraph and will populate a stream per subgraph:
  
  - `rdf-streaming-updater.mutations` will remain and will contain all the 
mutations to update the existing setup
  - `rdf-streaming-updater.mutations-main` will be added and will contain 
mutations related to the `main`
  - `rdf-streaming-updater.mutations-scholarly` will be added and will contain 
mutations related to the `scholarly`
  
  The consumer component on the other end of the stream might require very few 
(no?) modifications and might just work by configuring the kafka topic it 
should consume.
  
  The ticket describes the general case to support an arbitrary number of 
subgraphs, there could be certainly a simpler way to handle this with the 2 
subgraphs special case but it might error prone and less future proof so we 
should attempt to solve the general case. If difficulties are found while 
implementing it we can always reconsider and give up on the general case.
  
  Stubs
  -
  
  //Stubs// are artificial triples (explained in 
WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs 
<https://www.wikidata.org/wiki/User:DCausse_(WMF)/WDQS_Split_Refinement#Add_triples_to_help_navigate_between_the_subgraphs>)
 that will take the following form:
  
wd:Q42 wikibaseqs:subgraph wdsubgraph:main
  
  - `wikibaseqs` (with suggested IRI: `http://wikiba.se/queryservice#`) a new 
namespace to hold the vocabulary of things related to the wdqs codebase and 
will probably be hard-coded there, `subgraph` will be the first and only term 
required at the moment
  - `wdsubgraph` (with suggested IRI: `https://query.wikidata.org/subgraph/`) a 
new namespace to hold the IRIs identifying subgraphs in the scope wikidata 
query service. It will be setup in the `prefixes.json` config file.
  
  Rules
  -
  
  The rules should be extremely simple to apply and will require only the data 
available locally after fetch the entity content. How the rules are expressed 
is up for discussion but could be a simple yaml file:
  
subgraphs:
   - scholarly:
   stream: "rdf-streaming-updater.mutations-scholarly"
   default: block
   rules:
 - "pass ?entity wdt:P31 wd:Q13442814"
 - "pass ?entity rdf:type wikibase:Property"
   stubs_source: true
   stubs_subgraph_uri: "https://query.wikidata.org/subgraph/scholarly";
   - main:
   stream: "rdf-streaming-updater.mutations-main"
   default: pass
   rules:
 - "block ?entity wdt:P31 wd:Q13442814"
   stubs_source: true
   stubs_subgraph_uri: "https://query.wikidata.org/subgraph/main";
   - full:
   stream: "rdf-streaming-udpater.mutations"
   default: pass
   stubs_source: false
  
  - rules are prefixed with `pass` or `block` telling what to do if evaluated 
to true
  - `?entity` will be replaced by the entity URI being updated
  - `[]` means any rdf literal
  
  The rules are applied in order and stop at the first match, if it's a `pass` 
the entity should enter this subraph if it's a block it should not pass. If no 
rule matches then entity uses the `default` setting to decide to either `pass` 
or `block`.
  The `stubs_source` attribute will determine if this graph is OK to be linked 
from a stub triple when an entity is blocked from another subgraph.
  
  Rules Outcome
  -
  
  Applying the rules will only answer the question: //does this entity revision 
belong to this graph?//
  The actual set of mutations to apply may also vary depending of the type of 
MutationOperation 
<https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/+/refs/heads/master/streaming-updater-producer/src/main/scala/org/wikidata/query/rdf/updater/MutationOperation.scala>
 to apply.
  
  **Diff**
  
  When a diff is required the rules will be applied from both the previous and 
next revision.
  
  - when an entity enters a subgraph:
- Diff: to remove stubs
- FullImport: the entity is fully imported
  - when an entity leaves a subgraph:
- DeleteItem: the entity is fully delete
- Diff: to add the stubs
  - when an entity stays in the subgraph
- Diff: simple diff
  - when an entity stays outside the 

[Wikidata-bugs] [Maniphest] T361114: Alert Search Platform and/or DPE SRE when Wikidata is lagged

2024-04-04 Thread dcausse
dcausse added a comment.


  Thanks! I'm not very familiar with alerts being set from grafana neither, 
I'll try to get more info on this, worst case we can always set up a new one 
directly in alertmanager just for the wdqs lag and sent to the search team 
using the same formula used by updateQueryServiceLag.php.

TASK DETAIL
  https://phabricator.wikimedia.org/T361114

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, bking, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361114: Alert Search Platform and/or DPE SRE when Wikidata is lagged

2024-04-04 Thread dcausse
dcausse removed dcausse as the assignee of this task.
dcausse added a comment.


  @Lucas_Werkmeister_WMDE thanks! Do you know where we could update this to 
include our alert email for such alerts?

TASK DETAIL
  https://phabricator.wikimedia.org/T361114

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, bking, Danny_Benjafield_WMDE, 
S8321414, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement

2024-04-03 Thread dcausse
dcausse moved this task from Needs review to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  Should be working properly now

TASK DETAIL
  https://phabricator.wikimedia.org/T353683

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, 
Danny_Benjafield_WMDE, gonzalez.actor, S8321414, Astuthiodit_1, karapayneWMDE, 
toberto, Invadibot, maantietaja, Wilmanbeno, ItamarWMDE, Nintendofan885, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, 
Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361106: Restore wdqs1013 with a data transfer

2024-03-29 Thread dcausse
dcausse closed this task as "Declined".
dcausse added a comment.


  won't be required after all

TASK DETAIL
  https://phabricator.wikimedia.org/T361106

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: bking, dcausse
Cc: dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-29 Thread dcausse
dcausse closed subtask T361106: Restore wdqs1013 with a data transfer as 
"Declined".

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled

2024-03-28 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T361246

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361106: Restore wdqs1013 with a data transfer

2024-03-28 Thread dcausse
dcausse moved this task from Backlog to Blocked / Waiting on the 
Data-Platform-SRE (2024.03.25 - 2024.04.14) board.
dcausse added a comment.


  I restarted the updater on wdqs1013 and it's catching up, I have a note to 
check the status tomorrow and will repool it if necessary.

TASK DETAIL
  https://phabricator.wikimedia.org/T361106

WORKBOARD
  https://phabricator.wikimedia.org/project/board/7054/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: bking, dcausse
Cc: dcausse, Aklapper, bking, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled

2024-03-28 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T361246

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T361246: scap deploy should not repool a wdqs node that is depooled

2024-03-28 Thread dcausse
dcausse added a project: Wikidata-Query-Service.

TASK DETAIL
  https://phabricator.wikimedia.org/T361246

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-28 Thread dcausse
dcausse added a comment.


  I could re-enable puppet on wdqs1013 and restart the updater to catchup on 
updates. But apparently this machine was repooled yesterday (as part of the 
wdqs scap deploy I suppose) and thus started to serve stale data without 
triggering any maxlag. It's when re-enabling puppet that I realized that this 
node was still pooled so I depooled it immediately but this caused a maxlag for 
several minutes.
  Scap repooling machines might be something we might look into to avoid this 
kind of issues in the future.

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-28 Thread dcausse
dcausse added a comment.


  depooling the node we can see that the query rate actually going down to 0, 
request rate is generally very low on codfw so we might have to tune the 
threshold at around 0.2.
  F43663858: image.png <https://phabricator.wikimedia.org/F43663858>

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T336352: Update maxlag calculation maintenance script to reflect new prometheus queries

2024-03-26 Thread dcausse
dcausse removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T336352

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo, dcausse
Cc: Lucas_Werkmeister_WMDE, Aklapper, ItamarWMDE, dcausse, 
Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331, Isabelladantes1983, Themindcoder, Adamm71, 
Jersione, Hellket777, LisafBia6531, 786, Biggs657, Juan90264, Alter-paule, 
Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, 
Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan, Neuronton
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse added a comment.


  The approach taken is:
  
  - from nginx control a new header named 'x-monitoring-query' set to true if a 
list of criteria is met (currently using user-agent strings but could be 
extended to using source IPs as well I suppose)
  - from blazegraph, do not log query with the header `x-monitoring-query` set
  - adapt `Wikidata.org` to allow tuning the //minimal query rate// expected to 
be served from a pooled served (was hardcoded to 1.0)
  - change the systemd timer that runs `updateQueryServiceLag.php` to set 
`--pooled-server-min-query-rate` to 0.5 (will need to double check that this 
value is sane and works well for codfw and eqiad servers)

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse moved this task from Incoming to Needs review on the Discovery-Search 
(Current work) board.
dcausse claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, S8321414, Jersione, Hellket777, LisafBia6531, 
Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, Lewizho99, 
Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse added a comment.


  Here are the UAs seen in hour of a depooled server:
  
+--+-+
|UA|count|
+--+-+
|check_http/v2.3.3 (monitoring-plugins 2.3.3)  |87   |
|Twisted PageGetter|2146 |
|prometheus-public-sparql-ep-check |1913 |
|wmf-prometheus/prometheus-blazegraph-exporter (r...@wikimedia.org)|120  |
+--+-+

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse triaged this task as "High" priority.
dcausse added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse added a comment.


  Mitigation:
  
  - blazegraph stopped
  - updater stopped with the `/srv/wdqs/data_loaded` flag removed
  - puppet disabled

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Propagating the lag of a wdqs host should only be done if this host is 
''pooled'' (actually serving user traffic).
  Determining the ''pooling'' status appeared to be quite challenging in our 
infra so in T336352 <https://phabricator.wikimedia.org/T336352> we started 
using a metric based on the query rate hoping that it would be a reasonably 
proxy for determining if the server is serving users or not.
  
  This worked well so far but a recent incident where a server was depooled 
after being stuck for some reasons showed that this metric based on query rate 
is too fragile:
  We consider a server to be pooled if its query rate is above 1 qps:
  
`rate(org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries{}[10m])
 > 1`
  
  Sadly this was not true on wdqs1013 when it was depooled, for some reasons 
its query rate was still above 1 (below 1.3). It is possible that this metric 
is polluted with monitoring queries that do not relate to serving user traffic. 
We should perhaps refine how we generate 
`org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries`
 and make sure we only measure user queries.
  
  AC:
  
  - wdqs lag propagation should no longer include false positives (count the 
lag of a server that is actually depooled)

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation

2024-03-21 Thread dcausse
dcausse moved this task from In Progress to Needs review on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  draft page: 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits

TASK DETAIL
  https://phabricator.wikimedia.org/T357966

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation

2024-03-05 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Ready for Dev -- SWE to In Progress on the 
Discovery-Search (Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T357966

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement

2024-03-05 Thread dcausse
dcausse moved this task from In Progress to Needs review on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  changed the layout of the query a bit by moving the logistic function 
introduced in T271799 <https://phabricator.wikimedia.org/T271799> to the 
top-level so that it wraps the new nearmatch clause

TASK DETAIL
  https://phabricator.wikimedia.org/T353683

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, 
Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71, 
gonzalez.actor, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
Biggs657, karapayneWMDE, toberto, Invadibot, maantietaja, Wilmanbeno, 
Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Nintendofan885, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, LawExplorer, Lewizho99, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits

2024-03-04 Thread dcausse
dcausse claimed this task.
dcausse moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  Compiled 10 real world examples at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples

TASK DETAIL
  https://phabricator.wikimedia.org/T357980

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-03-04 Thread dcausse
dcausse added a comment.


  final report available at 
https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/WDQS_Graph_Split_Impact_Analysis

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, 
Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356773: [tracking] Community feedback for the WDQS Split the Graph project

2024-03-04 Thread dcausse
dcausse added a comment.


  @Physikerwelt thanks for your feedback.
  
  Blazegraph is definitely not the best solution and the work to move off of 
blazegraph should be tracked under https://phabricator.wikimedia.org/T330525 
(see the initial exploration 
<https://www.wikidata.org/wiki/File:WDQS_Backend_Alternatives_working_paper.pdf>
 we have done). The solutions you suggest might be better discussed in their 
own tickets as a subtask of T335067 <https://phabricator.wikimedia.org/T335067>.
  This particular ticket is about collecting feedback regarding use-cases that 
might be affected by the split. This split is one of the solution we want to 
experiment to address the scalabity issues of WDQS 
<https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/ScalingStrategy>. 
We are conscious about the usability issues that you raise but at this point we 
are more focused on understanding the feasibility and limitations of federation 
with such a split. It should be worth noting that one goal is to be sure that 
use-cases not relying on the scientific articles should still work without 
federation.

TASK DETAIL
  https://phabricator.wikimedia.org/T356773

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sannita, dcausse
Cc: Physikerwelt, EgonWillighagen, ArthurPSmith, Sj, dcausse, valerio.bozzolan, 
tfmorris, Gehel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356773: [tracking] Community feedback for the WDQS Split the Graph project

2024-03-04 Thread dcausse
dcausse added a comment.


  In T356773#9531179 <https://phabricator.wikimedia.org/T356773#9531179>, 
@EgonWillighagen wrote:
  
  > I tried to get the federation working, but got time outs too. The problem 
is that the current setup makes splits at a statement level. That is, given 
statements with some property (e.g. P2860 
<https://phabricator.wikimedia.org/P2860> and P1433 
<https://phabricator.wikimedia.org/P1433>), some results are in one QS instance 
and some are in the other. That means a lot of federation-union combinations to 
get all results. I posted an example query that is affected (the first I tried) 
in this issue report: https://github.com/WDscholia/scholia/issues/2423
  
  I got this query rewritten at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Number_of_articles_with_CiTO-annotated_citations_by_year,
 I agree that given the current split strategy we have to UNION the main and 
scholarly articles graph most of the time.

TASK DETAIL
  https://phabricator.wikimedia.org/T356773

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sannita, dcausse
Cc: Physikerwelt, EgonWillighagen, ArthurPSmith, Sj, dcausse, valerio.bozzolan, 
tfmorris, Gehel, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement

2024-03-04 Thread dcausse
dcausse moved this task from To Be Deployed to In Progress on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  The new builder moved the result to  #4 which is better but still not enough 
and it's beaten by 3 other images because other criteria:
  
  - weighted_tags:image.linked.from.wikipedia.lead_image/Q458
  - statement_keywords:p180=q458
  
  Moving back to in-progress to fine-tune the weight (probably bumping from 3.5 
to 10).

TASK DETAIL
  https://phabricator.wikimedia.org/T353683

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, 
Danny_Benjafield_WMDE, gonzalez.actor, Astuthiodit_1, karapayneWMDE, toberto, 
Invadibot, maantietaja, Wilmanbeno, CBogen, ItamarWMDE, Nintendofan885, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, 
Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits

2024-02-26 Thread dcausse
dcausse added a comment.


  WIP at 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples

TASK DETAIL
  https://phabricator.wikimedia.org/T357980

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: tfmorris, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2024-02-21 Thread dcausse
dcausse added a subtask: T357980: Compile a set of queries rewritten with 
federation across the two graph splits.

TASK DETAIL
  https://phabricator.wikimedia.org/T337013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, 
Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, Astuthiodit_1, 
AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, 
NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits

2024-02-21 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T357980

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation across the two graph splits

2024-02-21 Thread dcausse
dcausse renamed this task from "Compile a set of queries rewritten with 
federation accross the two graph splits" to "Compile a set of queries rewritten 
with federation across the two graph splits".

TASK DETAIL
  https://phabricator.wikimedia.org/T357980

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357980: Compile a set of queries rewritten with federation accross the two graph splits

2024-02-21 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Having a set of examples might be helpful for users experimenting with the 
graph split.
  A subpage under 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split 
might be appropriate.
  The set of queries to rewrite could be sourced from the samples used in 
T355040 <https://phabricator.wikimedia.org/T355040>.
  An example should consist of a query that requires scholarly articles and its 
rewritten form. Ideally the results should yield identical results when applied 
to the global graph and when applied to the splits.
  
  AC:
  
  - a new subpage of 
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split is 
available with several (between 5 and 10?) examples queries federating 
`query-main-experimental.wikidata.org` and 
`query-scholarly-experimental.wikidata.org`.

TASK DETAIL
  https://phabricator.wikimedia.org/T357980

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2024-02-21 Thread dcausse
dcausse added a subtask: T357966: Document limitations of blazegraph federation.

TASK DETAIL
  https://phabricator.wikimedia.org/T337013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: SEgt-WMF, dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE, 
Aklapper, Lydia_Pintscher, me, Danny_Benjafield_WMDE, Astuthiodit_1, 
AWesterinen, BeautifulBold, Suran38, Invadibot, maantietaja, Peteosx1x, 
NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation

2024-02-21 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS.

TASK DETAIL
  https://phabricator.wikimedia.org/T357966

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T357966: Document limitations of blazegraph federation

2024-02-21 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Writing a query that federates multiple SPARQL endpoints can be challenging 
if the intermediate results that have to be shared are big.
  
  Better understanding and documenting such limitations might help users 
writing such queries:
  
  - federating wdqs from a wcqs query
  - rewriting wdqs queries with federation in the scope of the graph split 
experiment
  
  AC:
  
  - documentation of the limits added in 
https://www.mediawiki.org/wiki/Wikidata_Query_Service (or a sub-page)

TASK DETAIL
  https://phabricator.wikimedia.org/T357966

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-02-08 Thread dcausse
dcausse moved this task from In Progress to Needs review on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  Draft report up at 
https://wikitech.wikimedia.org/wiki/User:DCausse/WDQS_Graph_Split_Impact_Analysis

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, 
Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T353453: [Analytics] Impact of Scholia on WDQS

2024-02-08 Thread dcausse
dcausse added a comment.


  In T353453#9524925 <https://phabricator.wikimedia.org/T353453#9524925>, 
@AndrewTavis_WMDE wrote:
  
  > Quick note on this:
  >
  > There are two ways that need to be factored in to deriving if a query is 
from Scholia. Some queries do start with `#tool: scholia` as @dcausse 
suggested, but I checked for user agents and also found that the string 
`"Scholia"` is also used as a user agent. Big thing is that some of the queries 
have the comment and some have the user agent, but in no cases do we have both.
  
  Indeed I saw these two as well, I'm not sure how to interpret this yet but it 
could be that some are coming from web browsers browsing 
https://scholia.toolforge.org/ (`#tool: scholia` in the query) and the 
"Scholia" user-agent might be from some automated tooling used by scholia that 
we have yet to discover? Looking at the queries might help.
  Regarding `#tool: scholia` something I noted is a non negligible portion of 
the traffic is coming from automated web crawlers, this might be interesting to 
identify and distinguish.

TASK DETAIL
  https://phabricator.wikimedia.org/T353453

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE, dcausse
Cc: Lydia_Pintscher, dcausse, Aklapper, Manuel, Danny_Benjafield_WMDE, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-02-02 Thread dcausse
dcausse added a comment.


  WIP:
  
  - included the new 100k queries sample named `QUERY-Q4` from T349512 
<https://phabricator.wikimedia.org/T349512> (random sample that is 
representative of the query length and runtime)
  - the % of affected queries (deduplicated) per tool is (//sample// being the 
`QUERY-Q4` sample mentionned above) F41752511: image.png 
<https://phabricator.wikimedia.org/F41752511>
  
  The above graph should be taken with a grain of salt as the number of queries 
per datapoints varies a lot (86 queries for //Listeria// vs 85k for 
//random//), these numbers are being reviewed so no conclusions should be drawn 
yet but it does not seem that we obtain the same numbers that were found 
originally in Wikidata_Subgraph_Query_Analysis 
<https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis#Query_count_and_time>
 where 2.5% of the total query count are being identified as requiring 
scholarly articles.
  A more qualitative analysis is in progress:
  
  - analyze of the user agents to understand what usecases are mainly affected, 
preliminary results show that for instance a single UA is the cause of 50% of 
the affected queries
  - extract some SPARQL queries to start evaluating how federation could be 
applied/tested

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, 
Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs

2024-02-02 Thread dcausse
dcausse added a comment.


  @dr0ptp4kt thanks! is the difference in the number of successful queries only 
explained by the improvement in query time or are there some improvements in 
the number of queries that timeout as well?

TASK DETAIL
  https://phabricator.wikimedia.org/T355037

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dr0ptp4kt, dcausse
Cc: dr0ptp4kt, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints

2024-01-31 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T355888

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dcausse, Aklapper, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
BTullis, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, 
joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, 
Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356243: process_sparql_query_hourly sometimes fails on the jena sparql parser

2024-01-31 Thread dcausse
dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Failure seen while 
`org.wikidata.query.rdf.spark.transform.queries.sparql.QueryExtractor` was 
processing the dataset 
`event.wdqs_external_sparql_query/year=2024/month=1/day=30/hour=9`.
  
Last 4096 bytes of stderr :
riterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visit(WriterOp.java:302)
at org.apache.jena.sparql.algebra.op.OpUnion.visit(OpUnion.java:49)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.printOp(WriterOp.java:582)
at 
org.apache.jena.sparql.sse.writers.WriterOp$OpWriterWorker.visitOp2(WriterOp.java:134)
  
  The logs are truncated but it could possibly be a recursion issue failing 
with a `StackOverflow`.

TASK DETAIL
  https://phabricator.wikimedia.org/T356243

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities

2024-01-30 Thread dcausse
dcausse added a comment.


  Scanning dumps from 2024/01/21 we can find 1623 duplicated statement ids 
(full list here: 
https://people.wikimedia.org/~dcausse/T356161_sdc_duplicated_statement_ids.csv)

TASK DETAIL
  https://phabricator.wikimedia.org/T356161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, 
CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Ricordisamoa, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities

2024-01-30 Thread dcausse
dcausse renamed this task from "WikibaseMediaInfo (or Wikibase?) seems to reuse 
statement identifiers from other entities" to "WikibaseMediaInfo seems to reuse 
statement identifiers from other entities".
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T356161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, 
CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Ricordisamoa, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities

2024-01-30 Thread dcausse
dcausse added a comment.


  @Lucas_Werkmeister_WMDE thanks for all the context! I get that it only 
affects WikibaseMediaInfo. Can we exclude Wikibase as a culprit possibly 
affecting wikidata or should we run a quick investigation to find possible 
duplicated statement identifiers in the wikidata RDF dumps?

TASK DETAIL
  https://phabricator.wikimedia.org/T356161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, 
CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Ricordisamoa, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities

2024-01-30 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T356161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lucas_Werkmeister_WMDE, dcausse, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, karapayneWMDE, toberto, Invadibot, maantietaja, 
CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Ricordisamoa, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T356161: WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities

2024-01-30 Thread dcausse
dcausse created this task.
dcausse added projects: WikibaseMediaInfo, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Structured-Data-Backlog.

TASK DESCRIPTION
  Seen on M130887689 
<https://commons.wikimedia.org/wiki/Special:EntityData/M130887689.ttl?flavor=dump>
 and  M115086921 
<https://commons.wikimedia.org/wiki/Special:EntityData/M115086921.json> the 
content of the wikibase entity is almost identical.
  The statement ids are the same which is highly problematic for the Wikibase 
RDF representation which assumes that a statement id is unique and belong to a 
single entity.
  E.g. `M130887689$83501cde-4a4b-a7d0-9832-5f1982be0c41` is referenced by both 
M130887689 & M115086921.
  
  I'm not sure what actions have led to this situation but this should 
definitely be fixed to make sure that the statement ids are not shared.
  
  AC:
  
  - identify what action caused an entity to re-use statement ids
  - determine if this problem affects Wikibase itself and wikidata
  - fix this behavior
  - cleanup existing entities that have non unique statement ids

TASK DETAIL
  https://phabricator.wikimedia.org/T356161

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, toberto, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Ricordisamoa
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-01-26 Thread dcausse
dcausse added a comment.


  WIP: 
https://people.wikimedia.org/~dcausse/T355040_EARLY_DRAFT_wdqs_query_results_analysis.html
 (UA redacted for now)
  
  TL/DR:
  
  - added support for identifying true positives (queries with a scientific 
article in the sparql query or in the results)
  - MixNMatch has a very high number of true positives, thus need more 
qualitative analysis (ticket TBD)
  - Listeria does not have any true positives but shows bad outcome (81% 
identical in the best case, 68% worst case), needs more qualitative analysis too

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, 
Themindcoder, Adamm71, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 786, 
Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, 
Beast1978, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, 
CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, 
Bsandipan, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, 
Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T351650: Expose 3 new dedicated WDQS endpoints

2024-01-25 Thread dcausse
dcausse added a subtask: T355888: Enable cross federation between experimental 
WDQS endpoints.

TASK DETAIL
  https://phabricator.wikimedia.org/T351650

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, dcausse
Cc: Gehel, bking, dcausse, dr0ptp4kt, RKemper, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints

2024-01-25 Thread dcausse
dcausse added a parent task: T351650: Expose 3 new dedicated WDQS endpoints.

TASK DETAIL
  https://phabricator.wikimedia.org/T355888

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dcausse, Aklapper, AWesterinen, BTullis, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355888: Enable cross federation between experimental WDQS endpoints

2024-01-25 Thread dcausse
dcausse created this task.
dcausse added projects: Data-Platform-SRE, Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Experimental endpoints `query-main-experimental` and 
`query-scholarly-experimental` must allow cross federation.
  A simple way to achieve this might be to allow these 3 experimental endpoints 
to be part of the `allowlist` stored in puppet, it might enable unnecessary 
federation between production servers and the experimental ones (not ideal but 
probably acceptable?).
  
  Ultimately the following queries must be working after allowing such 
federation:
  From https://query-scholarly-experimental.wikidata.org the query:
  
# all papers by ISNI  0001 2124 7940 (Carlo Rovelli)
SELECT ?article ?articleLabel {
  ?author wdt:P213 " 0001 2124 7940"
  SERVICE <https://query-main-experimental.wikidata.org/sparql> {
# Querying the scholarly article split
?article wdt:P50 ?author ;
 wdt:P31 wd:Q13442814 .
BIND(?articleLabel as ?articleLabel) .
SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE],en". }
  }
}
  
  And from https://query-main-experimental.wikidata.org/ the query:
  
# all papers by ISNI  0001 2124 7940 (Carlo Rovelli)
SELECT ?article ?articleLabel {
  SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
# Querying the wikidata main graph split
?author wdt:P213 " 0001 2124 7940"
  }
  hint:Prior hint:runFirst true . # Tell blazegraph to first collect ?author
  ?article wdt:P50 ?author ;
   wdt:P31 wd:Q13442814 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE],en". }
}
  
  Should work.
  
  AC:
  
  - federation works between query-main-experimental and 
query-scholarly-experimental
  - the 2 test queries work

TASK DETAIL
  https://phabricator.wikimedia.org/T355888

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: RKemper, dcausse, Aklapper, AWesterinen, BTullis, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-01-19 Thread dcausse
dcausse added a comment.


  Quick report on the progress being made:
  
  - Our query logs do not only contains sparql queries and the sparql client 
used to collect the data has to be adapted to support these (ASK, CONSTRUCT, 
DESCRIBE) (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/991622)
  - Getting failures due to response size, bumped the limit to 16M but still 
getting problems, I might stop here and simply tag & ignore such massive 
queries moving forward
  - Getting very bad numbers from Listeria and MixNMatch (34% and 17% identical 
respectively), avg result size is 1.6k and 8k so might explain partly why 
getting identical results is difficult, need more investigations to understand 
the cause...
  - Getting pretty mediocre numbers for WikidataIntegrator at 88% with very 
small avg result size at 8,  more investigation needed
  - Pywikibot and SPARQLWrapper are good at 99.4% for both

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement

2024-01-18 Thread dcausse
dcausse claimed this task.
dcausse moved this task from Ready for Dev -- SWE to In Progress on the 
Discovery-Search (Current work) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T353683

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: matthiasmullie, dcausse, Cparle, Bugreporter, Nikki, Aklapper, Davidshq, 
Danny_Benjafield_WMDE, gonzalez.actor, Astuthiodit_1, karapayneWMDE, toberto, 
Invadibot, maantietaja, Wilmanbeno, CBogen, ItamarWMDE, Nintendofan885, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, KimKelting, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, 
Mbch331, jeremyb
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355040: Compare the results of sparql queries between the fullgraph and the subgraphs

2024-01-15 Thread dcausse
dcausse created this task.
dcausse added projects: Wikidata, Wikidata-Query-Service.

TASK DESCRIPTION
  By using a tool to compare the differences of two results of the same sparql 
query we should evaluate how many queries might "break" when running against 
the wikidata main graph instead of the full graph.
  
  Comparison will use T351819 <https://phabricator.wikimedia.org/T351819> and 
be based on the sets of sparql extracted in T349512 
<https://phabricator.wikimedia.org/T349512>.
  
  We should attempt to identify the reasons of the differences and whether they 
are related or unrelated to the split:
  
  - query features dependent on internal ordering the blazegraph btrees (LIMIT 
X OFFSET Y, bd:slice)
  - use of external datasets (federation, mwapi)
  - unicode collation issues (T233204 
<https://phabricator.wikimedia.org/T233204>)
  - ...add more when discovered
  
  For the queries whose results vary because of the split we should attempt to 
evaluate if targeting scholarly articles is intentional or not (e.g. 
statistical queries with group by counts) and possibly identify the tools and 
their maintainers to contact them to gather feedback on the project.
  
  AC:
  
  - a report is available showing how the current split is going to affect 
queries once run on the wikidata main subgraph
  - a list of affected tools/scripts (when identifiable) that could possibly be 
contacted

TASK DETAIL
  https://phabricator.wikimedia.org/T355040

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Gehel, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, 
AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, KimKelting, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T352538: [EPIC] Evaluate the impact of the graph split

2024-01-15 Thread dcausse
dcausse added a subtask: T355037: Compare the performance of sparql queries 
between the full graph and the subgraphs.

TASK DETAIL
  https://phabricator.wikimedia.org/T352538

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, Gehel, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, 
BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, 
NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, KimKelting, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs

2024-01-15 Thread dcausse
dcausse added a parent task: T352538: [EPIC] Evaluate the impact of the graph 
split.

TASK DETAIL
  https://phabricator.wikimedia.org/T355037

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T355037: Compare the performance of sparql queries between the full graph and the subgraphs

2024-01-15 Thread dcausse
dcausse renamed this task from "Com" to "Compare the performance of sparql 
queries between the full graph and the subgraphs".
dcausse added a project: Wikidata-Query-Service.
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T355037

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


  1   2   3   4   5   6   7   8   9   10   >