[Wikidata-bugs] [Maniphest] T353683: Unable to find a file by filename while adding a Commons media file statement

2024-01-09 Thread dcausse
dcausse added subscribers: Cparle, dcausse. dcausse added a project: SDAW-MediaSearch. dcausse added a comment. Restricted Application added a project: Structured-Data-Backlog. Selecting only namespace=6 does trigger the MediaSearch query profile which does not include the `all_near_match

[Wikidata-bugs] [Maniphest] T354142: 502 error on some Lingua Libre federated queries

2024-01-04 Thread dcausse
dcausse added a comment. Closed as a duplicate of T299290 <https://phabricator.wikimedia.org/T299290>, quickly testing it seems that the 502 is triggered depending on the query size: select * { service <https://lingualibre.org/sparql> { ?e <https://lingu

[Wikidata-bugs] [Maniphest] T354142: 502 error on some Lingua Libre federated queries

2024-01-04 Thread dcausse
dcausse closed this task as a duplicate of T299290: Unexpected behavior in federated queries with LinguaLibre in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T354142 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Nikki

[Wikidata-bugs] [Maniphest] T354043: Decide the name, domain and logo of WDQS for scholarly articles

2024-01-04 Thread dcausse
dcausse edited projects, added Wikidata-Query-Service; removed Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T354043 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, Midleading, AWesterinen

[Wikidata-bugs] [Maniphest] T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph

2023-12-19 Thread dcausse
dcausse added a subtask: T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24. TASK DETAIL https://phabricator.wikimedia.org/T350464 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-12-15 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder

[Wikidata-bugs] [Maniphest] T353453: [Analytics] QUERY-Q3: Extract a set of queries known to be used by scholia

2023-12-14 Thread dcausse
dcausse added a comment. note that scholia queries generally starts with the comment: # tool: scholia TASK DETAIL https://phabricator.wikimedia.org/T353453 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper

[Wikidata-bugs] [Maniphest] T249989: Disable native OpenSearch suggestions in Wikibase wikis

2023-12-11 Thread dcausse
dcausse removed projects: wmde-wikidata-tech, Web-Team-Backlog. dcausse added a comment. @Jdlrobson in T249989#9330640 <https://phabricator.wikimedia.org/T249989#9330640> a user reports that a similar problem sometimes happen with the new vector 2022 search completion box on

[Wikidata-bugs] [Maniphest] T351942: wbstatementquantity search keyword seems broken

2023-12-11 Thread dcausse
dcausse added a comment. In T351942#9395775 <https://phabricator.wikimedia.org/T351942#9395775>, @Michael wrote: > Mh, the tickets associated with T191633: Implement searching of 'depicts' on commons <https://phabricator.wikimedia.org/T191633> are interesting, I can't s

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-11-30 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T351942: wbstatementquantity search keyword seems broken

2023-11-28 Thread dcausse
dcausse added a comment. I think this feature was originally meant to be used on commons and it appears that it was never properly configured anywhere (unless I'm missing something). For a quantity to be searchable via this keyword a statement must have a qualifier of type

[Wikidata-bugs] [Maniphest] T351819: Create a tool that records and compares a set of sparql query results

2023-11-28 Thread dcausse
dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T351819 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T351819: Create a tool that records and compares a set of sparql query results

2023-11-28 Thread dcausse
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T351819 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T351894: Write a tool that converts IGUANA test results into tabular data suited for analysis needs

2023-11-27 Thread dcausse
dcausse moved this task from Incoming to Needs review on the Discovery-Search (Current work) board. dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T351894 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T351894: Write a tool that converts IGUANA test results into tabular data suited for analysis needs

2023-11-27 Thread dcausse
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T351894 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T351894: Write a tool that converts IGUANA test results into tabular data suited for analysis needs

2023-11-23 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION IGUANA does output its test results as RDF (e.g. result.nt <https://gitlab.wikimedia.org/repos/search-platform/IGUANA/-/blob/main/wdqs-example-su

[Wikidata-bugs] [Maniphest] T349519: Determine if IGUANA and TFT would fit our query analysis needs

2023-11-23 Thread dcausse
dcausse moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. **TFT** does not seem appropriate for the kind of tests we have to make within the scope of the graph split project, it does not provide anything to ease

[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2023-11-22 Thread dcausse
dcausse added a subtask: T351819: Create a tool that records and compares a set of sparql query results. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, RKemper, bking

[Wikidata-bugs] [Maniphest] T351819: Create a tool that records and compares a set of sparql query results

2023-11-22 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T351819 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T351819: Create a tool that records and compares a set of sparql query results

2023-11-22 Thread dcausse
dcausse renamed this task from "Create a tool that records and compare a set of sparql results" to "Create a tool that records and compares a set of sparql query results". TASK DETAIL https://phabricator.wikimedia.org/T351819 EMAIL PREFERENCES https://phabricator.wi

[Wikidata-bugs] [Maniphest] T351819: Create a tool that records and compare a set of sparql results

2023-11-22 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION In order to evaluate the impact of splitting the wikidata graph we want to compare the outcome of some queries against different endpoint

[Wikidata-bugs] [Maniphest] T241128: EPIC: Reduce the time needed to do the initial WDQS import

2023-11-21 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: MPhamWMF, Gehel, Addshore, dcausse, Aklapper, me, Danny_Benjafield_WMDE, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T350348: results of query.wikidata are unstable (besides caching issues)

2023-11-13 Thread dcausse
dcausse added a comment. @Herzi.Pinki sorry to see that this problem is hitting your query again, I still believe that this might be a bug in blazegraph possibly related to how it optimizes it query plan. I think the section to cause much trouble to blazegraph is the named query

[Wikidata-bugs] [Maniphest] T350348: results of query.wikidata are unstable (besides caching issues)

2023-11-11 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350348 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, Herzi.Pinki, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-11-06 Thread dcausse
dcausse added a comment. @bkink thanks for triggering the import, could update the task description with the dump files you used? (needed because we have to explicitly keep the corresponding partition in hdfs). TASK DETAIL https://phabricator.wikimedia.org/T347504 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

2023-10-31 Thread dcausse
dcausse added a comment. @EgonWillighagen thanks for the question! The set of triples that will be part of the split are the triples that we consider //owned// by the item, in other words these are the triples listed by Special:EntityData using the //dump// flavor, e.g. https

[Wikidata-bugs] [Maniphest] T349519: Determine if IGUANA and TFT would fit our query analysis needs

2023-10-31 Thread dcausse
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. dcausse added a comment. @AWesterinen thanks for the heads-up, the scope of this ticket is to determine if these tools can be useful in the context

[Wikidata-bugs] [Maniphest] T350106: Implement a spark job that converts a RDF triples table into a RDF file format

2023-10-31 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service, Data-Platform-SRE, Discovery-Search (Current work). TASK DESCRIPTION The table `wikibase_rdf` contains 4 columns (not counting partition columns): - context - subject - preficate - object We

[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

2023-10-27 Thread dcausse
dcausse added subscribers: EgonWillighagen, Daniel_Mietchen, Fnielsen. dcausse added a comment. @Daniel_Mietchen @Fnielsen @EgonWillighagen as discussed in our previous meeting here is the task to coordinate the efforts around exploring federation for scholia queries. The ticket description

[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2023-10-27 Thread dcausse
dcausse added a subtask: T349911: Explore the feasibility of using SPARQL federation for scholia queries. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, RKemper, bking

[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

2023-10-27 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T349911 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T349911: Explore the feasibility of using SPARQL federation for scholia queries

2023-10-27 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The purpose of this ticket it to explore how federation could be used to rewrite scholia queries in the context of the WDQS graph split using

[Wikidata-bugs] [Maniphest] T349829: List and document all known public SPARQL endpoints serving wikidata

2023-10-26 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T349829 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas

[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2023-10-26 Thread dcausse
dcausse added a subtask: T349829: List and document all known public SPARQL endpoints serving wikidata. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, RKemper, bking

[Wikidata-bugs] [Maniphest] T349829: List and document all known public SPARQL endpoints serving wikidata

2023-10-26 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T349829 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T349829: List and document all known public SPARQL endpoints serving wikidata

2023-10-26 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T349829 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas

[Wikidata-bugs] [Maniphest] T349829: List and document all known public SPARQL endpoints serving wikidata

2023-10-26 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION It might be interesting to have a documented list of public SPARQL endpoints that serve wikidata and document them: - might help to promote

[Wikidata-bugs] [Maniphest] T349519: Determine if IGUANA and TFT would fit our query analysis needs

2023-10-23 Thread dcausse
dcausse added a project: Wikidata-Query-Service. TASK DETAIL https://phabricator.wikimedia.org/T349519 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune

[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2023-10-23 Thread dcausse
dcausse added a subtask: T349519: Determine if IGUANA and TFT would fit our query analysis needs. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, RKemper, bking, tfmorris

[Wikidata-bugs] [Maniphest] T349147: Follow up on rdf-streaming-updater failure 2023-10-17

2023-10-23 Thread dcausse
dcausse added a comment. @bking staging looks fine to me thanks for the deploy! Note that you can use the "redeploy" command as well this should take care of stopping the job with a savepoint and restarting it using the new jar. The command for WDQS@staging should

[Wikidata-bugs] [Maniphest] T337013: [Epic] Splitting the graph in WDQS

2023-10-23 Thread dcausse
dcausse added a subtask: T349512: Collect multiple sets of SPARQL queries. TASK DETAIL https://phabricator.wikimedia.org/T337013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dr0ptp4kt, RKemper, bking, tfmorris, elal, karapayneWMDE

[Wikidata-bugs] [Maniphest] T349512: Collect multiple sets of SPARQL queries

2023-10-23 Thread dcausse
dcausse added a parent task: T337013: [Epic] Splitting the graph in WDQS. TASK DETAIL https://phabricator.wikimedia.org/T349512 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, AWesterinen, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T349512: Collect multiple sets of SPARQL queries

2023-10-23 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION As part of the offline evaluation of the WDQS graph split (scholarly article vs the rest) we want to extract multiple sets of SPARQL queries

[Wikidata-bugs] [Maniphest] T349069: Design and implement a WDQS data-reload mechanism that sources its data from HDFS instead of the snapshot servers

2023-10-17 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION As of today the data-reload cookbook does multiple tasks on the wdqs host being reloaded: - copy the dumps from the snapshot machines

[Wikidata-bugs] [Maniphest] T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service

2023-10-10 Thread dcausse
dcausse moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Reconciled these items manually, improving the reliability of the event system will be tracked in other tasks such as T345195 <ht

[Wikidata-bugs] [Maniphest] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink

2023-10-10 Thread dcausse
dcausse moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. all jobs have been restarted to use newer kafka apis TASK DETAIL https://phabricator.wikimedia.org/T326914 WORKBOARD https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T348320: Query service returns non-existent badges for a sitelink

2023-10-10 Thread dcausse
dcausse closed this task as a duplicate of T323239: Badges for sitelinks not getting updated in query service after a move. TASK DETAIL https://phabricator.wikimedia.org/T348320 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse

[Wikidata-bugs] [Maniphest] T323239: Badges for sitelinks not getting updated in query service after a move

2023-10-10 Thread dcausse
dcausse merged a task: T348320: Query service returns non-existent badges for a sitelink. dcausse added a subscriber: Nikki. TASK DETAIL https://phabricator.wikimedia.org/T323239 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Nikki

[Wikidata-bugs] [Maniphest] T348320: Query service returns non-existent badges for a sitelink

2023-10-10 Thread dcausse
dcausse added a comment. Thanks for the report, I believe that this problem is similar to T323239 <https://phabricator.wikimedia.org/T323239>. I have manually "reconciled" this item. TASK DETAIL https://phabricator.wikimedia.org/T348320 EMAIL PRE

[Wikidata-bugs] [Maniphest] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink

2023-10-04 Thread dcausse
dcausse added a comment. The new kafka connectors seem to do low level interactions with the kafka-clients (using java introspection <https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/s

[Wikidata-bugs] [Maniphest] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink

2023-10-04 Thread dcausse
dcausse added a comment. After upgrading the test job to the new kafka connector APIs I now hit this problem: - `org.apache.kafka.common.errors.UnknownProducerIdException: This exception is raised by the broker if it could not locate the producer metadata associated with the producerId

[Wikidata-bugs] [Maniphest] T347989: Adapt rdf-spark-tools to split the wikidata graph based on a set of rules

2023-10-03 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T347989 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T347989: Adapt rdf-spark-tools to split the wikidata graph based on a set of rules

2023-10-03 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The rdf-spark-tools has a set of tools to import and munge a wikidata dump. This process makes the wikibase RDF graph available in hive

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-10-03 Thread dcausse
dcausse added a comment. @bking only one host has to be loaded with the full dataset. The loading process can be started as soon as possible but there are few constraints: - once we settle on a dump to import we will have to stick to it (if it fails we have to continue using this same

[Wikidata-bugs] [Maniphest] T347515: The WDQS streaming updater should have a way to disable or tag side output events

2023-10-03 Thread dcausse
dcausse set the point value for this task to "5". dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T347515 WORKBOARD https://phabricator.wikimedia.org/pro

[Wikidata-bugs] [Maniphest] T347515: The WDQS streaming updater should have a way to disable or tag side output events

2023-10-02 Thread dcausse
dcausse added a project: Sustainability (Incident Followup). Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T347515 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, dcausse, Aklapper

[Wikidata-bugs] [Maniphest] T347515: The WDQS streaming updater should have a way to disable or tag side output events

2023-10-02 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T347515 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-09-28 Thread dcausse
dcausse reopened this task as "Open". dcausse moved this task from Needs Reporting to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board. dcausse added a comment. @Hannah_Bast thanks for making such a change! I did a quick test locally and everything seems to wor

[Wikidata-bugs] [Maniphest] T347515: The WDQS streaming updater should have a way to disable or tag side output events

2023-09-27 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The WDQS streaming updater captures the problems it encounter into 3 different streams. These streams are then processed by a reconciliation

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-26 Thread dcausse
dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Discovery-Search (Current work) board. dcausse set the point value for this task to "2". TASK DETAIL https://phabricator.wikimedia.org/T347333 WORKBOARD https://phabricator.wikimedia.org/pro

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-26 Thread dcausse
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T347333 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: bking, Aklapper, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T347333: Tune process_sparql_query_hourly so that it does not get killed by yarn

2023-09-25 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The job process_sparql_query_hourly is getting killed by YARN with: Caused by: org.apache.spark.SparkException: Job aborted due to stage

[Wikidata-bugs] [Maniphest] T347284: https://query.wikidata.org/bigdata/ldf is broken

2023-09-25 Thread dcausse
dcausse edited projects, added Wikidata-Query-Service; removed Wikidata Query UI. TASK DETAIL https://phabricator.wikimedia.org/T347284 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: MisterSynergy, dcausse, Aklapper, AWesterinen

[Wikidata-bugs] [Maniphest] T347284: https://query.wikidata.org/bigdata/ldf is broken

2023-09-25 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata Query UI. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Reported via IRC on `#wikimedia-cloud`: 13:43 Hey all, the Linked Data Fragments endpoint of WDQS is not available with a 502 error for quite

[Wikidata-bugs] [Maniphest] T346456: Improve concurrency limits configuration of the wdqs updater

2023-09-20 Thread dcausse
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T346456 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T346456: Improve concurrency limits configuration of the wdqs updater

2023-09-15 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T346456 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, bking, Clement_Goubert, dcausse, Kappakayala, AWesterinen, Arnoldokoth, wkandek

[Wikidata-bugs] [Maniphest] T346456: Improve concurrency limits configuration of the wdqs updater

2023-09-15 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, serviceops. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION The WDQS updater have several config options to reduce the concurrency at which it calls the MW api. The config option

[Wikidata-bugs] [Maniphest] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink

2023-09-14 Thread dcausse
dcausse claimed this task. dcausse moved this task from Incoming to In Progress on the Data-Platform-SRE board. TASK DETAIL https://phabricator.wikimedia.org/T326914 WORKBOARD https://phabricator.wikimedia.org/project/board/6524/ EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T326914: Migrate the WDQS streaming updater from FlinkKafkaConsumer/Producer to KafkaSource/Sink

2023-09-14 Thread dcausse
dcausse added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T326914 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T344284: Rename usages of whitelist to allowlist in query service rdf repo

2023-09-11 Thread dcausse
dcausse claimed this task. dcausse moved this task from Blocked / Waiting to In Progress on the Data-Platform-SRE board. TASK DETAIL https://phabricator.wikimedia.org/T344284 WORKBOARD https://phabricator.wikimedia.org/project/board/6524/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-09-08 Thread dcausse
dcausse added a comment. @Hannah_Bast sorry about this I mixed this ticket with another one, supporting `https://qlever.cs.uni-freiburg.de/api/dblp` would require changing the Accept the header that blazegraph sends during federation requests and it does not appear to be something that can

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-09-08 Thread dcausse
dcausse added a comment. Note that even if we changed blazegraph to accept multiple formats for all endpoints by setting the header that you suggest (`Accept: application/sparql-results+xml, application/sparql-results+json`) the https://data.nlg.gr/sparql endpoint still produces an http 500

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-09-08 Thread dcausse
dcausse added a comment. @Hannah_Bast Blazegraph does properly send the header `Accept: application/sparql-results+xml` but it seems that this endpoint does only work when requesting `application/sparql-results+json`, anything else produces an http 500 error: curl -k -XPOST -H

[Wikidata-bugs] [Maniphest] T344876: Wikibase MediaInfo should provide access to page name via query service

2023-09-07 Thread dcausse
dcausse moved this task from Incoming to RDF Model on the Wikidata-Query-Service board. dcausse added a comment. For context `schema:url` was added to help to join P18 <https://phabricator.wikimedia.org/P18> statements from wikidata with commons media items (T277665

[Wikidata-bugs] [Maniphest] T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service

2023-09-06 Thread dcausse
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. dcausse added a comment. Going to work on improving the tooling regarding reconciliations of missed deletes but I won't be working on the root cause. I

[Wikidata-bugs] [Maniphest] T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses)

2023-08-10 Thread dcausse
dcausse added a comment. In T342123#9081490 <https://phabricator.wikimedia.org/T342123#9081490>, @AndrewTavis_WMDE wrote: > Minor question on this, @dcausse: why aren't we caching `df_wikidata_rdf` and `sa_and_sasc_ids` above? My assumption is that we should given that we're u

[Wikidata-bugs] [Maniphest] T342123: [Analytics] Find out the size of the Q13442814 (scholarly article) subgraph (including instances of subclasses)

2023-08-09 Thread dcausse
dcausse added a comment. At a glance I suspect that now you might get duplicated QIDs in sa_and_sasc_ids = ( df_wikidata_rdf.select(col("subject").alias("sa_and_sasc_qids")) .where(col("predicate") == P31_DIRECT_URL) .where(col(&

[Wikidata-bugs] [Maniphest] T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service

2023-08-08 Thread dcausse
dcausse added projects: Data Engineering and Event Platform Team, Data-Engineering, Event-Platform. TASK DETAIL https://phabricator.wikimedia.org/T342593 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: ArielGlenn, Milimetric, dcausse

[Wikidata-bugs] [Maniphest] T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service

2023-08-07 Thread dcausse
dcausse added a comment. | item | deletion date | | Q10813441| 2023-06-06T14:21:49 | | Q32994683| 2023-05-03T16:04:27 | | Q55929561| 2023-05-31T20:24:58 | | Q109548562| 2023-06-06T14:22:29 | | Q111436860| 2023-05-04T06:54:56 | | None

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-08-04 Thread dcausse
dcausse added a comment. I suspect that because the `claims` field being an array of complex types it can potentially be huge and asking to generate its string representation using `f"{claims}"` might cause excessive mem usage and is I believe a very slow operation. I would look

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-08-03 Thread dcausse
dcausse added a comment. @AndrewTavis_WMDE sure! I'll send you an invite for next monday, in the meantime could you share your notebook somewhere so that I can take a look before the call? TASK DETAIL https://phabricator.wikimedia.org/T342111 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-08-03 Thread dcausse
dcausse added a comment. @AndrewTavis_WMDE thanks! this is really exciting, we couldn't hope for better results... it's almost a 50-50 split. And on top of that, and if I read your results correctly we only have 197,583 common triples (7,521,423,558 - 7,521,225,975) that will have

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-08-02 Thread dcausse
dcausse added a comment. - `<http://www.wikidata.org/entity/>` prefix generally `wd` refers to the concept URI of the entity, this is generally how an entity (whether it's a property, item or lexeme) is identified, e.g. Q42 is identified as `wd:Q42` -> `<http://www.wikidata.or

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-07-31 Thread dcausse
dcausse added a comment. In T342111#9055397 <https://phabricator.wikimedia.org/T342111#9055397>, @AndrewTavis_WMDE wrote: > Also for all's information, the duplicate triple values in `discorvery.wikibase_rdf` is very very small as seen in the following snipp

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-07-31 Thread dcausse
dcausse added a subscriber: JAllemandou. dcausse added a comment. In T342111#9055326 <https://phabricator.wikimedia.org/T342111#9055326>, @AndrewTavis_WMDE wrote: > @dcausse, a general point on my end is that when I'm trying to run the code that you sent along vi

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-07-31 Thread dcausse
dcausse added a comment. In T342111#9054619 <https://phabricator.wikimedia.org/T342111#9054619>, @Manuel wrote: > Hi @dcausse, thank you so much, this is very helpful! \o/ > >> I believe that at first we are interested in knowing the number of triples that

[Wikidata-bugs] [Maniphest] T342111: [Analytics] Find out the size of direct instances of Q13442814 (scholarly article)

2023-07-28 Thread dcausse
dcausse added a comment. I believe that at first we are interested in knowing the number of triples that would be moved out if all item that verifies the condition: `?s wdt:P31 Q13442814` are moved out with all the triples //belonging// to these items. The triples that belongs to an entity

[Wikidata-bugs] [Maniphest] T342416: Set data permission on new snapshot generation (discovery.wikibase_rdf)

2023-07-21 Thread dcausse
dcausse added a project: Wikidata-Query-Service. TASK DETAIL https://phabricator.wikimedia.org/T342416 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: BTullis, AndrewTavis_WMDE, Aklapper, JAllemandou, AWesterinen, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T341905: Undeleted Wikidata items do not reappear in WDQS

2023-07-18 Thread dcausse
dcausse added a comment. Recording few technical informations while we still have them. The undelete events appear to have been sent by MediaWiki but they don't appear to have the revision_id in them: Topic `eqiad.mediawiki.page-undelete`: {"$schema":"/mediawik

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-07-17 Thread dcausse
dcausse added a subtask: T341792: Provision Zookeeper Cluster for storing Flink HA data. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-07-17 Thread dcausse
dcausse removed a parent task: T341792: Provision Zookeeper Cluster for storing Flink HA data. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, BTullis, JMeybohm, gmodena

[Wikidata-bugs] [Maniphest] T341905: Undeleted Wikidata items do not reappear in WDQS

2023-07-17 Thread dcausse
dcausse claimed this task. dcausse moved this task from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board. TASK DETAIL https://phabricator.wikimedia.org/T341905 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-07-17 Thread dcausse
dcausse added a parent task: T341792: Provision Zookeeper Cluster for storing Flink HA data. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: BTullis, JMeybohm, gmodena, Ottomata

[Wikidata-bugs] [Maniphest] T336352: Update maxlag calculation maintenance script to reflect new prometheus queries

2023-07-05 Thread dcausse
dcausse added a comment. @hoo using `topk` sounds good to me! I used `max` to graph the maxlag on a single timeseries in grafana but hadn't thought about your usecase. TASK DETAIL https://phabricator.wikimedia.org/T336352 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-06-26 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, fbalicchia, Kappakayala

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-06-26 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, fbalicchia, Kappakayala

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-06-26 Thread dcausse
dcausse removed dcausse as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, fbalicchia

[Wikidata-bugs] [Maniphest] T339810: ApiUsageException when searching Commons or Wikidata

2023-06-20 Thread dcausse
dcausse added a comment. Incident report: https://wikitech.wikimedia.org/wiki/Incidents/2023-06-18_search_broken_on_wikidata_and_commons TASK DETAIL https://phabricator.wikimedia.org/T339810 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T339810: ApiUsageException when searching Commons or Wikidata

2023-06-18 Thread dcausse
dcausse lowered the priority of this task from "Unbreak Now!" to "Medium". dcausse added a comment. Problem seems to be fixed, lowering priority. We'll write a small incident doc shortly. TASK DETAIL https://phabricator.wikimedia.org/T339810 EMAIL

[Wikidata-bugs] [Maniphest] T339810: ApiUsageException when searching Commons or Wikidata

2023-06-18 Thread dcausse
dcausse triaged this task as "Unbreak Now!" priority. TASK DETAIL https://phabricator.wikimedia.org/T339810 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lydia_Pintscher, hashar, Lovelano, Vojtech.dostal, Nikki, md

<    1   2   3   4   5   6   7   8   9   10   >