[Wikidata-bugs] [Maniphest] T288266: Better understand the makeup of specific Wikidata object types that probably can't be dropped
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288266 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Esc3300, Manuel, MPhamWMF, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BeautifulBold, Suran38, karapayneWMDE, Invadibot, maantietaja, Peteosx1x, NavinRizwi, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288259: Get estimates for how many Wikidata items don't have at least 3 backlinks
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288259 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288260: Get estimates for size of non-normalized values in Wikidata
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288260 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288261: Determine if there are consistently used top ranked Wikidata statements, and how many of them are there
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288261 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288264 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: RShigapov, Lydia_Pintscher, Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288265: Get estimates for Wikidata items without hot properties that are being queried
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T288265 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Manuel, MPhamWMF, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis
AKhatun_WMF added a comment. In T303831#8063021 <https://phabricator.wikimedia.org/T303831#8063021>, @EBernhardson wrote: > In terms of the exact code causing this, spark is terrible at telling us exactly where but trying to infer from the SparkUI output i think it's this join: > > def getTopSubgraphItems(topSubgraphs: DataFrame): DataFrame = { > wikidataTriples > .filter(s"predicate='<$p31>'") > .selectExpr("object as subgraph", "subject as item") > .join(topSubgraphs.select("subgraph"), Seq("subgraph"), "right") This is exactly the code that finds out the top subgraphs. And yes, the data is definitely heavily skewed, that is the nature of Wikidata and anything we do on Wikidata by subgraphs is going to run into similar issues. For reference, half of wikidata is under 1 single subgraph, and the rest half has 100s of subgraphs. We might need to start considering spark3. > And i suppose this is also only the first skewed join in the execution, there may be more later in the computations. Unfortunately, yes. `subgraph_query_mapping` is going to be another big feat I believe, it has similar joins and writes data daily. But we will see. In T303831#8064293 <https://phabricator.wikimedia.org/T303831#8064293>, @EBernhardson wrote: > - Enabled subgraph_query_mapping_daily. This started waiting for snapshot=20220613 (last monday) with an execution_date of 20220620 (also a monday). I suspect we should adjust this to target snapshot=20220620, but waiting for confirmation. Turned back off so it doesn't timeout and complain. It is correct to look for data from last Monday, because the data of 20220620 actually got populated the following Friday. So if the job is running on current data, it wont find data for Monday on the same day. All of this maneuver is because the input data is both weekly and daily, so every day the job looks for data from the last Monday. This makes me think if the same should be done for `subgraph_mapping_weekly`, as it looks for 20220620 on the same day, even though it will be populated the following Friday. This job runs weekly, same as input data. > - Enabled subgraph_query_metrics_daily. This is waiting for `event.wdqs_external_sparql_query/datacenter=eqiad/year=2022/month=6/day=20` (and same for codfw) but it needs to be waiting on the individual hourly partitions. I hadn't thought this fully through when reviewing the patch, we will need to adjust the sensor to use HivePartitionRangeSensor which can generate all the intermediate hourly named partitions. Turned back off as it's also waiting for outputs of subgraph_query_mapping_daily (iiuc) which is turned off currently. Attempting this. TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis
AKhatun_WMF added a comment. Update: I tested a few options in the statbox, I am not sure how much this will represent the prod env, but here goes: coalesce + 8G driver memory = failed as identified by Erik (SparkOutOfMemoryError at topSubgraphItems, application_1655808530211_109990) coalesce + 16G driver memory = failed (SparkOutOfMemoryError at topSubgraphItems, application_1655808530211_110190) repartition + 8G driver memory = failed (Reason: Executor heartbeat timed out after 176110 ms, application_1655808530211_110236) repartition + 16G driver memory = failed (Reason: Executor heartbeat timed out after 159925 ms, application_1655808530211_110343) repartition + 16G driver memory + 16G executor memory = failed (Reason: Executor heartbeat timed out after 145549 ms, application_1655808530211_110430) need to figure out the exact place that causes OOM TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis
AKhatun_WMF added a comment. In T303831#8058159 <https://phabricator.wikimedia.org/T303831#8058159>, @EBernhardson wrote: > the airflow patch is deployed but i only turned on *_init dags and subgraph_mapping_weekly today (ran out of time, will do rest tomorrow). > > subgraph_mapping_weekly failed the first time through. I updated executor memory from 8g to 12g but the second execution is still failing. something is quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs varying from 100MB to 450MB giving executions times of ~30s on the small ones and ~8m before the final one fails. > > Not specifically related to this patch, but i wonder if we could change up the `SparkUtils.saveTables` method to somehow take parameters in the path to specify coalesce vs repartition and the number of partitions to save by, so we only have to update the airflow invocation and not the jar as well to test variations there. Should we have params called `coalesce`, and `repartition`, and have them default to false. And when true, use `num_partitions` to coalesce or repartition accordingly? TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: EBernhardson, dcausse, Gehel, JAllemandou, Aklapper, AKhatun_WMF, Hellket777, Astuthiodit_1, AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T303831: Productionize Wikidata subgraph analysis
AKhatun_WMF created this task. AKhatun_WMF added projects: Discovery-Search (Current work), Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION As a Data Analyst for Wikidata/WDQS, I would like for the metrics from subgraph analysis done in T293628 <https://phabricator.wikimedia.org/T293628> to be periodically evaluated and stored over a period of time for further analysis and also so that anyone can access the analysis results without having to do all analysis from scratch. This ticket covers productionizing: - subgraph mapping to items and triples - subgraph metrics: subgraph size, number of items, predicate usage etc - query mapping to subgraph - subgraph query metrics: queries per subgraph, UA distribution, query time distribution, items/predicates usage etc List of all possible metrics: metrics-list <https://docs.google.com/spreadsheets/d/1G9WBUIXwkDiVvgK9shOvehJJp4fZzftkGYnom4HtDKU/edit?usp=sharing> TASK DETAIL https://phabricator.wikimedia.org/T303831 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, MPhamWMF, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T299921: Estimate benefits of splitting and federating Wikidata subgraphs
AKhatun_WMF removed AKhatun_WMF as the assignee of this task. AKhatun_WMF moved this task from Current work to Analysis on the Wikidata-Query-Service board. TASK DETAIL https://phabricator.wikimedia.org/T299921 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, MPhamWMF, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. AKhatun_WMF added a comment. The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Item_ORES_Score_Analysis> TASK DETAIL https://phabricator.wikimedia.org/T288262 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Lydia_Pintscher, JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF added a comment. In T288262#7629267 <https://phabricator.wikimedia.org/T288262#7629267>, @Lydia_Pintscher wrote: > @AKhatun_WMF: You mention on the wiki that some Items don't have an ORES score. All Items should have one 😬 Do you have an example of one that does not? Oh, it's not that they don't have a score per se. They're just not in the event data table, so I could not get a score for them to analyze. I will clarify that! If we could run an event for all existing items, we could get scores for all items. The way the table is populated at present, it only produces scores for the latest revisions I believe. TASK DETAIL https://phabricator.wikimedia.org/T288262 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Lydia_Pintscher, JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF added a comment. In T288262#7628599 <https://phabricator.wikimedia.org/T288262#7628599>, @MPhamWMF wrote: > @AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant when I wrote the question about "optimal separation" is given some distribution of ORES scores (e.g. a normal distribution), is it clear what the threshold is for what qualifies as a "high" vs "low" score: e.g. anything over .75 is a high score. But that's assuming the scores are continuous. I guess it's moot if they're binary (I don't actually know). > > If this isn't a sensible way of thinking about the issue, let me know if there's a better way. Ah, that I believe is already solved by the output of the model. Basically, we get probabilities for 5 classes (A to E) determining how good an item is, where A is the best and E is the worst. And then the score is calculated as `5*ProbabilityOfClassA + 4*ProbabilityOfClassB + 3*ProbabilityOfClassC + 2*ProbabilityOfClassD + 1*ProbabilityOfClassE`. But we can definitely define our own thresholds as well. The analysis is done here: Wikidata_Item_ORES_Score_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Item_ORES_Score_Analysis> I will be doing a bit more to get the scores per subgraph and will add it here as well. TASK DETAIL https://phabricator.wikimedia.org/T288262 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF added subscribers: dcausse, JAllemandou. AKhatun_WMF added a comment. @MPhamWMF Hi, could you please clarify the question `Is there an optimal separation between high/low ORES scores?`. Separation in what respect? To my mind comes the separation of items with respect to the subgraph it is part of. cc: @JAllemandou @dcausse TASK DETAIL https://phabricator.wikimedia.org/T288262 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, dcausse, ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF added a comment. @ACraze Indeed! I was confusing the models for revision (item quality) with edits (damaging/good faith). The latest revision is all I will need. Thank you! TASK DETAIL https://phabricator.wikimedia.org/T288262 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: ACraze, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Gethan, Simonmaignan, Invadibot, maantietaja, calbon, lmata, Anerka, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Xinbenlv, Vacio, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Fz-29, QZanden, EBjune, merbst, LawExplorer, elukey, _jensen, rosalieper, Mkdw, Scott_WUaS, Jonas, Xmlizer, notconfusing, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Alchimista, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288262: Estimate how many Wikidata items have low/no ORES score
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T288262 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph
AKhatun_WMF moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board. AKhatun_WMF added a comment. Counts of queries and triples for astronomical objects were done here: Wikidata_Subgraph_Query_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis>, along with the top ~300 large subgraphs. For the specific case of Astronomical objects (and only astronomical objects), a list of all its subclasses was obtained and manually inspected for relevance to astronomical objects. The subclass list also consists of `subclasses of subclasses` and so on. - Percent of triples: 8.7% - Percent of entities: 8.9% - Days to recover: 245 - Query count: 2.5M - Percent of queries: 1.3% - Percent time of all queries: 0.5% TASK DETAIL https://phabricator.wikimedia.org/T288257 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure
AKhatun_WMF moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. AKhatun_WMF added a comment. Details can be found here: Wikidata_Subgraph_Query_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis> TASK DETAIL https://phabricator.wikimedia.org/T295188 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T288257 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293631: Get estimates for splitting other large subgraphs from Wikidata
AKhatun_WMF added a project: Discovery-Search (Current work). AKhatun_WMF added a comment. With the completion of T293632 <https://phabricator.wikimedia.org/T293632> and T293636 <https://phabricator.wikimedia.org/T293636>, this task is complete. TASK DETAIL https://phabricator.wikimedia.org/T293631 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, MPhamWMF, JAllemandou, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF moved this task from incoming to in progress on the Wikidata board. AKhatun_WMF added a comment. With the completion of T293632 <https://phabricator.wikimedia.org/T293632> and T293636 <https://phabricator.wikimedia.org/T293636>, this task is complete. TASK DETAIL https://phabricator.wikimedia.org/T293628 WORKBOARD https://phabricator.wikimedia.org/project/board/71/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs
AKhatun_WMF moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. AKhatun_WMF added a comment. The analysis was completed and documented here: Wikidata_Subgraph_Query_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Query_Analysis> TASK DETAIL https://phabricator.wikimedia.org/T293636 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T258834 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, EChetty, toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T258834 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, JAllemandou, cchen, Nuria, Miriam, nettrom_WMF, EChetty, toberto, ldelench_wmf, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, Base, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T293636 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31
AKhatun_WMF added a project: Discovery-Search (Current work). AKhatun_WMF added a comment. Some analysis was done here: - Property usage across subgraphs: Predicates_across_subgraphs <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis#Predicates_across_subgraphs> - Top predicates also used in scholarly articles: Top_properties_used_in_other_subgraphs <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis#Top_properties_used_in_other_subgraphs> Suggested analysis: - Categorize usage type of properties: - Similar distribution of use across subgraphs - Have X% usage in Y subgraphs - Used in lots of small subgraphs, used in small quantity in all subgraphs - Entropy over the power-law distribution of the property across subgraphs (spark udf entropy) - This will give us a single number to represent the distribution of a property - WIll incorporate the distribution as well as the variability of property usage - The entropy distribution will tell us what kinds of properties we have on hand The suggested analysis could be done through a new ticket if required later on. TASK DETAIL https://phabricator.wikimedia.org/T291205 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata
AKhatun_WMF added a comment. The analysis was completed and documented here: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis TASK DETAIL https://phabricator.wikimedia.org/T293632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure
AKhatun_WMF added a comment. Sources: - T275068 <https://phabricator.wikimedia.org/T275068> - T293632 <https://phabricator.wikimedia.org/T293632> Wikidata_Subgraph_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis> - T281854 <https://phabricator.wikimedia.org/T281854> Wikidata_Scholarly_Articles_Subgraph_Analysis <https://wikitech.wikimedia.org/w/index.php?title=User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis> - T293636 <https://phabricator.wikimedia.org/T293636> TODO: Query count analysis for the subgraphs | Name | % of entities | % of triples | number of days for Blazegraph to recover at current rate of growth | % of queries potentially affected (monthly) | | - | - | | -- | --- | | description | | 20 | 518 | 12 | | external id | | 9| 239 | 30 | | label | | 4| 104 | 48 | | altLabel | | 0.8 | 21 | 16 | | name | | 0.6 | 16 | 8 | | lexicographical entities | 8 | | 10 | 0.09 | | scholarly article | 40| 50 | 1370 | 2 | | astronomical object | 9 | 9| 238 | | | human | 10| 7| 200 | | | Wikimedia category| 5 | 6| 157 | | | taxon | 3.4 | 3| 77 | | | family name | 0.5 | 1.4 | 40 | | | Wikimedia disambiguation page | 1.5 | 1.4 | 37 | | | gene | 1.3 | 0.9 | 25 | | | Wikimedia template| 0.9 | 0.9 | 23 | | | chemical compound | 1.3 | 0.7 | 19 | | | The numbers were rounded. Only the top 10 subgraphs were listed. More can be found here: Table_of_top_50_subgraph_information <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis#Table_of_top_50_subgraph_information> TASK DETAIL https://phabricator.wikimedia.org/T295188 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure
AKhatun_WMF added a comment. Sources: - T275068 <https://phabricator.wikimedia.org/T275068> - Wikidata_Subgraph_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis> - Wikidata_Scholarly_Articles_Subgraph_Analysis <https://wikitech.wikimedia.org/w/index.php?title=User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis> | number/% of entities | number/% of triples | number of days for Blazegraph to recover at current rate of growth | number/% of queries potentially affected | | | --- | -- | | | ok | nai | ok | ai | | ok | nai | ok | ai | | ok | nai | ok | ai | | ok | nai | ok | ai | | TASK DETAIL https://phabricator.wikimedia.org/T295188 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype
AKhatun_WMF added a comment. > Basically Wikidata's Properties have a datatype. Ah, datatype of properties. > I am not seeing that in the analysis you linked but maybe I am overlooking something. The one I listed is for datatype of objects, so you didn't miss anything. Thank you for clarifying! It should be fairly easy to find out as well :) TASK DETAIL https://phabricator.wikimedia.org/T288264 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288264: Get estimates for all Wikidata statements of a specific datatype
AKhatun_WMF added a subscriber: Lydia_Pintscher. AKhatun_WMF added a comment. @Lydia_Pintscher Is this ticket asking for counts of various datatype used in WIkidata? Both URI and literals. Does wikitech:User:AKhatun/Wikidata_Basic_Analysis#Object <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Basic_Analysis#Object> help? TASK DETAIL https://phabricator.wikimedia.org/T288264 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Lydia_Pintscher, Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T293632 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293636: Identify and analyze queries that touch on various large subgraphs
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION As a Data Analyst for Wikidata and WDQS, I would like to know how often the large subgraphs in Wikidata are queried. The aim is to get an estimate of the gain (or loss) of splitting them from Wikidata. Questions: - How many queries touch on the large subgraph of Wikidata - Analysis of those queries in terms of query time, user agent, etc - How many queries span across multiple subgraphs (to estimate how much query federation might be required) TASK DETAIL https://phabricator.wikimedia.org/T293636 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T293632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293632: Analysis of large subgraphs in Wikidata
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION As a Data Analyst for Wikidata and WDQS, I would like to know what are the other large subgraphs in Wikidata (besides scholarly articles and astronomical objects) and the connectivity between them. The aim is to get an estimate of the gain (or loss) of splitting them from Wikidata. Subgraphs in Wikidata - What are the various large subgraphs (found using P31 <https://phabricator.wikimedia.org/P31> and possible merge of obviously similar groups) - What are their sizes, how many items they have - Connectivity among these subgraphs - What properties do these subgraphs commonly use and what properties overlap among them - What items overlap - How many triples connect multiple subgraphs (through items, e.g `?item_of_subgraph1 Pxx ?item_of_subgraph2`) TASK DETAIL https://phabricator.wikimedia.org/T293632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293631: Get estimates for splitting other large subgraphs from Wikidata
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service. TASK DESCRIPTION As a Data Analyst for Wikidata and WDQS, I would like to know what are the other large subgraphs in Wikidata (besides scholarly articles and astronomical objects) and how often they are queried. The aim is to get an estimate of the gain (or loss) of splitting them off of Wikidata. This task has 2 parts: - Identifying and analyzing the subgraphs themselves - Query analysis of the queries that touch on these subgraphs TASK DETAIL https://phabricator.wikimedia.org/T293631 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, MPhamWMF, JAllemandou, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure
AKhatun_WMF removed a subtask: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph. TASK DETAIL https://phabricator.wikimedia.org/T282790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T288257 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure
AKhatun_WMF removed a subtask: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T282790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF added a subtask: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph. TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph
AKhatun_WMF added a parent task: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T288257 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF added a subtask: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF added a parent task: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF added a subtask: T291205: Analysis: Property usage by items' P31. TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31
AKhatun_WMF added a parent task: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T291205 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure
AKhatun_WMF added a subtask: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T282790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, Invadibot, maantietaja, Peteosx1x, NavinRizwi, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF added a parent task: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata, Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION As a Data Analyst for Wikidata and WDQS, I would like to know what are the various large subgraphs in Wikidata and what are the benefits/losses of splitting them off from Wikidata. The aim is to identify large subgraphs besides those already known (scholarly articles, astronomical objects) and find out how often these subgraphs are queried. This can be estimated from: - The subgraph sizes - Connection of subgraphs to other subgraphs - Number of queries that inquire of this subgraph - Number of queries that span multiple subgraphs (estimation of how much federation load) TASK DETAIL https://phabricator.wikimedia.org/T293628 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, MPhamWMF, Aklapper, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T291205 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T288257: Get estimates for size of astronomical objects and queries in Wikidata graph
AKhatun_WMF added a comment. Astronomical objects are structured hierarchically and so not everything is direct `instance of` Q6999 <https://www.wikidata.org/wiki/Q6999> (unlike scholarly articles). Considering all subclasses of Q6999 <https://www.wikidata.org/wiki/Q6999>, the number of astronomical objects form ~9% of all Wikidata entities. (sparql query <https://query.wikidata.org/#SELECT%20%28count%28%2a%29%20as%20%3Fcount%29%0AWHERE%0A%7B%0A%20%20%5B%5D%20wdt%3AP31%2Fwdt%3AP279%2a%20wd%3AQ6999.%0A%7D>) And an approximation of the number of triples 'related to' these entities is 7.5% (~1B) of all Wikidata triples. Approximated from top 10 subclasses (which are 7% of all entities) TASK DETAIL https://phabricator.wikimedia.org/T288257 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, Addshore, Manuel, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291205: Analysis: Property usage by items' P31
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T291205 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, Aklapper, Jmixter87, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS
AKhatun_WMF edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T291190 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS
AKhatun_WMF added a project: Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T291190 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS
AKhatun_WMF added a comment. Query analysis report for some vertical slices of Wikidata: Wikidata_Vertical_Analysis#Query_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#Query_Analysis> Summary: Wikidata_Vertical_Analysis#TL;DR <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis#TL;DR> TASK DETAIL https://phabricator.wikimedia.org/T291190 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF added a comment. Here is the analysis done on scholarly articles in Wikidata and WDQS queries related to them: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure
AKhatun_WMF added a subtask: T291190: Determine cost-benefit of doing vertical data slicing on WDQS. TASK DETAIL https://phabricator.wikimedia.org/T282790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: So9q, Manuel, Esc3300, Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Suran38, Invadibot, maantietaja, NavinRizwi, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T291190: Determine cost-benefit of doing vertical data slicing on WDQS
AKhatun_WMF added a parent task: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T291190 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289754: Triple level deduplication
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata. TASK DESCRIPTION The deduplication of wikibase RDF dumps happens at the quads level at the moment. i.e, context, subject, predicate, object. Therefore, even after deduplication, some triples were found to have different contexts, creating duplicates in the triple-level (subject, predicate, object). It may be required down the line to contain distinct triples in the dump, but since all these duplicates are related to wikipages, it is not required for analysis at the immediate present. - Number of duplicate triples = ~170K (Total triples: 12.9B) - Number of distinct triples that have duplicates: 5K A snippet of the duplicate triples: | Subject | Predicate | Object| Number of different Contexts | | https://zh.wikiquote.org/ | http://wikiba.se/ontology#wikiGroup | "wikiquote" | 306 | | https://ta.wikinews.org/ | http://wikiba.se/ontology#wikiGroup | "wikinews" | 302 | | https://ps.wikipedia.org/ | http://wikiba.se/ontology#wikiGroup | "wikipedia" | 301 | | https://fo.wikipedia.org/ | http://wikiba.se/ontology#wikiGroup | "wikipedia" | 301 | | https://am.wiktionary.org/| http://wikiba.se/ontology#wikiGroup | "wiktionary" | 37 | | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://schema.org/Article | 2| | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | http://schema.org/name | "Sjabloon:Navigatie voetbal Nederland Derde di... | 3 | | <https://nl.wikipedia.org/wiki/Sjabloon:Naviga... | http://schema.org/name | "Sjabloon:Navigatie voetbalclubs Nijkerk"@nl | 2 | | All predicates involved among the duplicate triples: | Predicate | Number of occurrences | | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | 1049 | | http://schema.org/inLanguage| 1049 | | http://schema.org/name | 1049 | | http://schema.org/isPartOf | 1049 | | http://wikiba.se/ontology#wikiGroup | 868 | TASK DETAIL https://phabricator.wikimedia.org/T289754 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, Aklapper, dcausse, AKhatun_WMF, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289753: Optimize deduplication of triples when loading into wikibase RDF dumps
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata. TASK DESCRIPTION The deduplication of triples as of now is not optimized. It takes ~3hrs, previously took ~1hr without deduplication, but it works nonetheless. @JAllemandou suggested few optimizations may be possible for the process of deduplication. This task is aimed to handle the possible optimizations. TASK DETAIL https://phabricator.wikimedia.org/T289753 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, dcausse, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T287225: Add all prefixes defined in Blazegraph
AKhatun_WMF added a comment. Hi, thanks for the deploy! Can we re-run the previous jobs? All preferably, since the analysis will require previous data. TASK DETAIL https://phabricator.wikimedia.org/T287225 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: EBernhardson, JAllemandou, Aklapper, dcausse, Lucas_Werkmeister_WMDE, MPhamWMF, Gehel, AKhatun_WMF, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF added a comment. In T281854#7266495 <https://phabricator.wikimedia.org/T281854#7266495>, @EgonWillighagen wrote: > @AKhatun_WMF, when you write "authors connected to other subgraphs", do you mean subgraphs within Wikidata (so, excluding external identifiers), or also graphs from other resources part of, for example, the Linked Open Data Cloud? I mean within wikidata. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T286436: Deduplicate triples when loading the wikibase RDF dumps into hive
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T286436 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T286436: Deduplicate triples when loading the wikibase RDF dumps into hive
AKhatun_WMF added a comment. Joseph will suggest an optimization to this task when he is back. For now a simple `.distinct()` has been done on Spark dataframe to facilitate analysis on Wikidata dumps. TASK DETAIL https://phabricator.wikimedia.org/T286436 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata
AKhatun_WMF added a comment. In T281854#7062631 <https://phabricator.wikimedia.org/T281854#7062631>, @Fnielsen wrote: > Some of the statistics that is wanted are listed on Scholia, currently on the frontpage: https://scholia.toolforge.org/ (UPDATE: now here: https://scholia.toolforge.org/statistics) > > "percentage, number of Wikidata entities that are scholarly article": > 37.246.721 Scholarly articles, so 37/97 ~ 40% are scholarly articles. Could I get an idea of what the 97 was and where the number was listed maybe? TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius, Daniel_Mietchen, Lokal_Profil, GoEThe, Alicia_Fagerving_WMSE, PKM, LWyatt, Multichill, Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T287225: Add all prefixes defined in Blazegraph
AKhatun_WMF created this task. AKhatun_WMF added projects: Wikidata-Query-Service, Wikidata, Discovery-Search (Current work). TASK DESCRIPTION As of now, the Jena parser fails if it cannot find some prefix definitions. We would like to include a list of all prefixes defined in Blazegraph by reusing those declared in other parts of the code, instead of listing them separately for the parser. TASK DETAIL https://phabricator.wikimedia.org/T287225 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: JAllemandou, Aklapper, dcausse, Esc3300, Lucas_Werkmeister_WMDE, MPhamWMF, Gehel, AKhatun_WMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries
AKhatun_WMF added a comment. @dcausse: Yes, just adding the prefix declaration in Jena parser is what we want to do. @JAllemandou: Should I add the other prefixes as well? TASK DETAIL https://phabricator.wikimedia.org/T285465 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, MPhamWMF, Lucas_Werkmeister_WMDE, Esc3300, dcausse, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries
AKhatun_WMF added a comment. - For June, the average daily successful parsing rate was **~85%**. Ranging from 75% to 90%. Note that this only includes queries with status 200 and 500. - 11% of the distinct queries ran into errors related to prefixes. The number of distinct queries due to each prefix is shown below. By adding the first 4 prefixes (mwapi, geof, foaf, gas) into the query processors' prefix list the average daily successful parsing rate was >96%. A few prefixes were off slightly (data instead of wdata, ref instead of wdref. These account for very few queries, but I fixed them nevertheless.) | **prefix_name** | **count ** | | mwapi | 7419357 | | geof| 54183 | | foaf| 17198 | | gas | 13753 | | wds | 2761 | | wdv | 216| | fn | 62 | | dc | 50 | | mediawiki | 23 | | wdref | 22 | | wdata | 3 | | Total distinct queries: 67467327 - Other errors included: - `Variable used when already in-scope`. This happened when the same variable was reused in a query. Testing such queries in WDQS returns results nicely. These form 2% of the errors in distinct queries. - Another notable error is the `WITH` clause. Although it runs well in WDQS, parser doesn't accept it. These form 2.5% of the distinct queries. It seems including the prefixes should fix things, but should we also think of fixing the other two errors (although small in number). Not sure why Jena cannot parse them though. TASK DETAIL https://phabricator.wikimedia.org/T285465 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T285465 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T285465: Document and analyze the number of parsing errors for parsed WDQS queries
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T285465 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282790: Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure
AKhatun_WMF added a comment. Some of the vertical analyses were done as a part of familiarizing with wikidata. See the findings in Wikidata_Vertical_Analysis <https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Vertical_Analysis>. Will get back to this ticket when done with T282139 <https://phabricator.wikimedia.org/T282139>. TASK DETAIL https://phabricator.wikimedia.org/T282790 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Addshore, AKhatun_WMF, MPhamWMF, Aklapper, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T282139 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset
AKhatun_WMF moved this task from Analysis to Current work on the Wikidata-Query-Service board. AKhatun_WMF added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T282139 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Esc3300, GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries
AKhatun_WMF triaged this task as "Low" priority. TASK DETAIL https://phabricator.wikimedia.org/T283256 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T273854: Automate regular WDQS query parsing and data-extraction
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T273854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: dcausse, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 4748kitoko, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T283255: Create CLI job extracting info from wdqs queries
AKhatun_WMF closed this task as "Resolved". AKhatun_WMF removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T283255 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Lalamarie69, Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis
AKhatun_WMF closed subtask T283255: Create CLI job extracting info from wdqs queries as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T280640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282139: Provide a quantitative description of the Wikidata-triples dataset
AKhatun_WMF added a comment. Some of the suggested information to analyse or extract through this analysis are: - Top items - Top properties - Top subject, object types - Top property types - Top wikidata vs other predicates - Number of S, P, O that don't involve wikidata - The aim is to find the size of the subgraph not concerning wikidata, i.e size of leaves. They are leaves because once they point to something outside of wikidata, they are not expanded within wikidata. Some things are not even exapandable like literals. If we have too many leaves, we may consider using property graphs (where leaves will be listed as properties of a node). TASK DETAIL https://phabricator.wikimedia.org/T282139 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: GoranSMilovanovic, CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries
AKhatun_WMF added a comment. Update 1 June 2021: Had a chat with @JAllemandou and based on the Wikidata Checkpoint Meeting of 27/5/2021, we will be taking up this ticket later as required. For now, we focus on productionizing the existing data extracted from SPARQL queries and get the data flowing (T273854 <https://phabricator.wikimedia.org/T273854>). We will need more info on how to flatten the AST but so far we have talked about making a simple list of tuples. The order of the list shows how the AST was traversed and each element in the list is a tuple of Type and Value. e.g (operator, join), (filter, ?x+?y = ?z), (node_var, x), (extend, ?x+?y as ?z) etc TASK DETAIL https://phabricator.wikimedia.org/T283256 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis
AKhatun_WMF removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T280640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Lalamarie69, Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T283256: Extract operator/nodes/triples/paths/exprs list from queries
AKhatun_WMF added a comment. Idea on how to store the SPARQL query as a list: Let's make a list of `generic custom class` **QueryElem[T]**. QueryElem contains `elemType: String` and `elem: T`. Classes for each element type needs to be created, e.g `NodeClass extends QueryElem`. Class defnitions of all elements are given below: For nodes: `elemType = "Node", elem: NodeInfo = some_node` (NodeInfo is a case class containing NodeType and NodeValue like `("NODE_VAR", "x")` ) For expression: `elemType = "expression"` Expressions can get quite convoluted, 1 variable, 2 variable, n variable. Like BIND("AK" as ?x), (?x+?y as ?z), (REGEX("[abc]*") as ?x) respecively. Moreover they can go very deep as well like FILTER(?x==1 || ?y==2 || ?z==3) **I am not entirely sure how to represent expressions** For BGP: `elemType = "BGP", elem: List[TripleInfo] = List(triple1, triple2, triple3, triple4, ...)` (TripleInfo contains NodeInfo for Sub, Pred and Obj) For services: `elemType = "service", serviceName:"service_name", elem: BGP` (service_name like wikibase:label) For tables: `elemType = "table", elem: TableData` TableData is: `tableVars: List[NodeInfo], tableRow: List[Rows]` Row is: `List[NodeInfo]` For paths (sub path obj) : A path predicate is identified as `PATH` in NodeType anyways, so we can consider paths to be ordinary triples. Or create a special `pathTriple` `elemType = "pathTriple", elem: TripleInfo` For filters: `elemType = "filter", elem: Expression` (Expression class as described above) For extends: `elemType = "extend", elem: Expression, expVar: NodeInfo` (Expression class as described above) e.g `(?x+?y as ?z)`, here `?z` is the expVar and elem is `?x+?y` elem can be a single Node as well: `BIND ("AK" as ?x)` Could it be anything else? **This requires more thinking and not sure what to put in `elem` for extends.** Op Names: `elemType = "operations", elem = "join"` (elem can be join, optional, project etc. Sometimes elem will be redundant, like BGP, path, table etc which have their own classes) Let me know if and what I am missing, how else can we represent a query as list? TASK DETAIL https://phabricator.wikimedia.org/T283256 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Gehel, dcausse, CBogen, Aklapper, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format
AKhatun_WMF added a comment. In T282130#7100051 <https://phabricator.wikimedia.org/T282130#7100051>, @JAllemandou wrote: > @AKhatun_WMF That's great! could you please provide some info on expected data-size in parquet (for daily data for instance)? Many thanks. @JAllemandou Added estimate of daily data size. TASK DETAIL https://phabricator.wikimedia.org/T282130 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T282130 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282130: Provide a way to save extracted query-information in parquet format
AKhatun_WMF claimed this task. AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T282130 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282129: Test triple-analysis functions over a large dataset with Spark
AKhatun_WMF claimed this task. AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T282129 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: CBogen, AKhatun_WMF, Aklapper, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis
AKhatun_WMF closed subtask T282127: Add unit-tests to WDQS analysis toolkit as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T280640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, Lalamarie69, MPhamWMF, maantietaja, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit
AKhatun_WMF closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T282127 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit
AKhatun_WMF added a comment. Unit tests done, patch merged! - Created a file containing queries that pass and also a file containing queries that don't pass. Those are checked for correctness in the unit tests. - Checked correctness of extracted nodes for 2 examples queries written inline in the code. TASK DETAIL https://phabricator.wikimedia.org/T282127 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T282127: Add unit-tests to WDQS analysis toolkit
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T282127 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Aklapper, CBogen, AKhatun_WMF, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T280640: Refine WDQS queries analysis
AKhatun_WMF claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T280640 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Aklapper, CBogen, dcausse, Gehel, JAllemandou, Invadibot, MPhamWMF, maantietaja, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs