[Wikidata-bugs] [Maniphest] T299460: Evaluate Apache Jena
Thadguidry added a comment. Hi @AndySeaborne What is the latest benchmarks for loading Wikidata all and truthy with Jena 4.4.0 release annd the new TDB2 xloader with "--threads" argument? I noticed the release notes said this: > == Improved bulk loader > > This release includes the version of the TDB2 xloader for very large > datasets. > > It has been used to load 16.6B triples (WikiData all) into TDB2 and > loading truthy (6B) on modest hardware. Thanks to Marco, Lorenz and > Øyvind for running Wikidata load trails. > > The loader now now has "--threads=" which been reported to give improved > load times (if the server has the hardware!). TASK DETAIL https://phabricator.wikimedia.org/T299460 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Osmasuominen, dcausse, Smalyshev, Aklapper, Lucas_Werkmeister_WMDE, Gehel, Andrawaag, Addshore, Susannaanas, Akuckartz, TomT0m, Jecummings4, Krabina, So9q, Salgo60, WMDE-leszek, GreenReaper, Ostrzyciel, Samantha_Alipio_WMDE, Tagishsimon, Lydia_Pintscher, DanBri, Jneubert, Ivanhercaz, TheKtk, Jerven, Justin0x2004, Afandian, Sj, TallTed, Tpt, Thadguidry, danshick-wmde, Hjfocs, Mohammed_Sadat_WMDE, MarioGom, karapayneWMDE, Daniel_Mietchen, KingsleyIdehen, Izno, RShigapov, Hannah_Bast, Kjauslin, toan, Michael, DD063520, AndreasKuczera, Versant.2612, namedgraph, Iamamz3, YULdigitalpreservation, BenAtOlive, nguyenm9, Fnielsen, accounting_data_logger, JohannesKalmbach, Dr.uesenfieber, Bovlb, AndySeaborne, BeautifulBold, Suran38, Invadibot, MPhamWMF, Jtm-lis, maantietaja, Peteosx1x, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, Nandana, Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Steko, Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, Manybubbles, brion, Mbch331, MarkAHershberger ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph
Thadguidry added a comment. @BenAtOlive I think for bikeshedding or hand-waving discussions, you can just start an new discussion thread in Oxigraph's GitHub Discussions (not Issues). Here: https://github.com/oxigraph/oxigraph/discussions TASK DETAIL https://phabricator.wikimedia.org/T289760 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: BenAtOlive, Justin0x2004, Izno, Gehel, Thadguidry, Tpt, So9q, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph
Thadguidry added a comment. As someone who has "been there, done that" (even with Apache Geode)... I can tell you that **data locality** is very important when you want to maximize performance. But if the data is maintained as distributed, then the only way to squeeze out improved performance is if you can temporarily have that **data locality** and that sometimes means temporary or ad hoc data replication...which has a cost itself but isn't insurmountable. TASK DETAIL https://phabricator.wikimedia.org/T289760 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: BenAtOlive, Justin0x2004, Izno, Gehel, Thadguidry, Tpt, So9q, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T220823: Use ElasticSearch for bulk Wikidata entity term lookup
Thadguidry added a comment. @Addshore That's what I figured. :-) This issue did feel old and sort of in a dustbin. Agree it should be closed. TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, WMDE-leszek, RazShuty, Nuria, Fjalapeno, EBernhardson, dcausse, alaa_wmde, Addshore, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph
Thadguidry added a comment. @Tpt Looks great! The ROADMAP file was a suggested alternative to the Milestones, sorry didn't make that clear. I much prefer grouping or tagging issues against Milestones as you have done! You have the right idea regarding a single source of truth and exactly the best practices! Your a natural. TASK DETAIL https://phabricator.wikimedia.org/T289760 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Gehel, Thadguidry, Tpt, So9q, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289760: Evaluate Oxigraph as alternative to Blazegraph
Thadguidry added a comment. Hi @Tpt Can you elaborate more in your Milestones and create more Milestone as necessary for your future vision? Like what you mean by "no storage format stability for now", and what that really means to users and what you are thinking about in the long term towards solving that? Maybe a ROADMAP.md file in the repo might be good to add as a quick high-level overview, which then has links to Milestones (and perhaps make more future vision Milestone links, even if 2 years away, or just a dream but wrapped with practicality). https://github.com/oxigraph/oxigraph/milestones GitHub Milestones are a great place to capture your future vision (even if some of it never happens!) You did great on the .3 Milestone description <https://github.com/oxigraph/oxigraph/milestone/4>, but you are describing Sled <https://docs.rs/sled/0.34.6/sled/doc/index.html> backend without giving the actual problem description or context that your thinking of solving with Sled <https://docs.rs/sled/0.34.6/sled/doc/index.html>? As you just did within here on Phabricator.) Could I ask you to perhaps frame up the general problems in the Milestones, what solutions your thinking of to solve them, and then add that detail using bullet points in the actual Milestone descriptions to give folks an idea of some of the ideas or features your thinking about. Thanks again for all that you do! TASK DETAIL https://phabricator.wikimedia.org/T289760 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Tpt, So9q, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T289428 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Gryllida, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T289428 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Gryllida, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T289428: U+002C comma is not being excluded by default in simple search input box for CirrusSearch
Thadguidry created this task. Thadguidry added projects: Wikidata, CirrusSearch, Elasticsearch. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Discovery-Search. TASK DESCRIPTION (Lydia asked that I write this up, just in case) I thought that "," comma was already added to the Elasticsearch standard tokenizer and would be excluded from simple search? But it seems that there is some overriding decision to have the default config this way on Wikidata? Perhaps the word_delimiter is being used and incorrectly? > Avoid using the word_delimiter filter with tokenizers that remove punctuation, such as the standard tokenizer. This could prevent the word_delimiter filter from splitting tokens correctly. It can also interfere with the filter’s configurable parameters, such as catenate_all or preserve_original. We recommend using the keyword or whitespace tokenizer instead. Below as seen in my screenshot, I was looking for entities that contained all 3 words, but it seemed if I DID NOT include the comma, then the entity was not found. The only way that it was displayed was if I did include the comma. F34615713: search_dropdown_screenshot.png <https://phabricator.wikimedia.org/F34615713> I noticed that the string "foot locker inc" will not show the entity in the dropdown, but only "foot locker, inc." which includes the comma? Exact match should only happen by default if a user wraps in double quotes, such as "Foot Locker, Inc." where in my example screenshot I am not doing that, so my expectation was that any U+002C comma in the search string would not be included in the search query. (On that entity, I have since added the full legal name into the alias field to help improve searchability, but still would like to know the decision on why U+002C comma is not being excluded) Why was U+002C comma decided to be included in simple search? Must use the Advanced Search on Wikidata or the API if we want to actually do simple searches that are not exact match phrases? This would seem counter-intuitive and the reverse of most users expectations. TASK DETAIL https://phabricator.wikimedia.org/T289428 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Aklapper, Thadguidry, Invadibot, MPhamWMF, maantietaja, Wilmanbeno, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Gryllida, jayvdb, Mbch331, jeremyb ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T220823: Use ElasticSearch for bulk Wikidata entity term lookup
Thadguidry added a comment. I'd suggest adding **replica shards** (copies of primary shards) that help to both ensure redundancy to protect against failure, but they also vastly increase the capacity for read requests such as searching, like Adam's entity term lookup use case. You can change the number of replica shards at any time without affecting indexing or query operations. https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, WMDE-leszek, RazShuty, Nuria, Fjalapeno, EBernhardson, dcausse, alaa_wmde, Addshore, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph
Thadguidry added subscribers: Tpt, Thadguidry. Thadguidry added a comment. +1 for Oxigraph. @TPT has been putting in a ton of good effort, research, features, and stability. Sponsoring him now in GitHub as well for his effort. As it's being developed in Rust, it automatically takes advantage of data streaming in places that utilizes intrinsic functions (forwarded through LLVM compiler IR) in CPU's. Java 17 is just now getting into a better position with it's new Vector API <https://openjdk.java.net/jeps/414>. On top of that, the RIO Parser is one of the fastest RDF parsers I've seen run on my system, which he also graciously maintains in Rust. TASK DETAIL https://phabricator.wikimedia.org/T206560 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Tpt, TallTed, Sj, Afandian, Justin0x2004, Jerven, TheKtk, Ivanhercaz, Jneubert, DanBri, Lydia_Pintscher, Tagishsimon, Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, WMDE-leszek, Salgo60, So9q, Krabina, Jecummings4, TomT0m, Akuckartz, Susannaanas, Addshore, Andrawaag, Gehel, Lucas_Werkmeister_WMDE, Aklapper, Smalyshev, Invadibot, MPhamWMF, Jtm-lis, maantietaja, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, Nandana, Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Steko, Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, Manybubbles, brion, Mbch331, MarkAHershberger ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T210961: Add a rank for outdated but correct data
Thadguidry added a comment. We'll also want to improve the Help:Ranking page <https://www.wikidata.org/wiki/Help:Ranking#Deprecated_rank> once this proposal task is implemented. TASK DETAIL https://phabricator.wikimedia.org/T210961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, abian, Nikki, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T210961: Add a rank for outdated but correct data
Thadguidry added a comment. Agree generally on this proposals' assertions. It makes sense to from a data quality perspective, and since we are actively adding new tools to improve our data quality, then having a new "outdated" rank to represent a "once upon a time this was factual" would be very convenient and easier to arrive at community consensus. In fact, some blockchains form consensus in the exact same fashion, ex. Solana blockchain sorta does the same thing to gain speed and efficiency (otherwise getting consensus on details slows it down) ... it cares about the fact that something occurred or has changed or is outdated...but the details of when, where, how, can be deduced or ascertained later. > One of Solana’s key distinguishing features is its proof of history consensus mechanism for adding new transactions to the blockchain, which shows that a certain event occurred at a specific time without requiring validators to talk to one another in order to agree on timing. According to the project’s founder and CEO, Anatoly Yakovlenko, this allows for greater speed and security. TASK DETAIL https://phabricator.wikimedia.org/T210961 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, abian, Nikki, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T287164: Improve bulk import via API
Thadguidry added a comment. Hi @aidhog Aidan in my opinion I would say "NO, not a good test-case for this need". And the only reason is this... it's ASCII only (chars <128) and doesn't let us unsure proper load handling for all data in all languages, multilingual data (ASCII > 128) such as UTF-8, etc. DBLP.xml is however a great test-case for any SAX parser as I can see in it's PDF https://dblp.uni-trier.de/xml/docu/dblpxml.pdf We ideally need to find a CC-0 public domain data set (or even create or generate one) in UTF-8 in both JSON and RDF/XML. Leaving out CSV for now, since pre-processing of CSV files into JSON records or RDF/XML is best in other tools that more easily can handle those conversions. Something like the British National Library's Linked Open Data - Serials LOD samples file https://www.bl.uk/bibliographic/downloads/BNBLODSerials_sample_rdf.zip (or the full file https://www.bl.uk/bibliographic/downloads/BNBLODSerials_202106_rdf.zip) available here https://www.bl.uk/collection-metadata/downloads# TASK DETAIL https://phabricator.wikimedia.org/T287164 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Afandian, Daniel_Mietchen, Tarrow, aidhog, johanricher, Addshore, Masssly, danshick-wmde, Thadguidry, Aklapper, RShigapov, Invadibot, maantietaja, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T285795: Limit languages on EntityStub rdf builders
Thadguidry added a comment. Someone needs to add a Documentation task to this. I assume all the new options available and perhaps a reference link to this ticket would go somewhere in here? https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format TASK DETAIL https://phabricator.wikimedia.org/T285795 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup, Thadguidry Cc: Thadguidry, Esc3300, Mohammed_Sadat_WMDE, toan, Aklapper, Addshore, Lydia_Pintscher, Tarrow, daniel, Lucas_Werkmeister_WMDE, Tonina_Zhelyazkova_WMDE, Ladsgroup, Invadibot, maantietaja, Hazizibinmahdi, Akuckartz, Iflorez, alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T219037: Display constraint clarifications in violation messages
Thadguidry added a comment. I'd like to see this made a bit higher priority? It seems it would be fairly trivial to implement with a good impact. TASK DETAIL https://phabricator.wikimedia.org/T219037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Esc3300, Lucas_Werkmeister_WMDE, Aklapper, ArthurPSmith, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Agabi10, Scott_WUaS, abian, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T236493: Adding a new lexeme should constraint languages form to languages
Thadguidry added a comment. To Reproduce: 1. Create a new Lexeme 2. **Lemma:** type `chevrette` 3. **Language of Lemma:** type `cajun` and look at dropdown listing 4. Notice that `Louisiana French` Q3083213 is at the bottom of dropdown list instead of top of list. TASK DETAIL https://phabricator.wikimedia.org/T236493 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Nikki, Lydia_Pintscher, Theklan, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T266212: improve Wikidata autocomplete service
Thadguidry added a comment. In Freebase, we offered word, phrase, and full (exact match). I think the wbsearchentities API could offer something similar, although with a slight cost of indexing. Besides `name` we also supported `alias{full}`. Using alias: matched both name and aliases, using name: matched only on name. Old docs archived here: https://web.archive.org/web/20160731201411/http://wiki.freebase.com/wiki/Search_Cookbook > In addition to specifying what text fields should be matched it is also possible to specify how the match should occur by inserting one of the following modifiers between the operand and the text field: > > {word} : require that the words in the string match words in the corresponding text field in the document. (default) > {phrase} : require that the words occur next to each other in the same order in the corresponding text field in the document. > {full} : like {phrase} but also require that the phrase exactly match the text field, not just words within the text field. Known as a "full match". > > For example, to find the musical single called Home by Marc Broussard, you would use a filter like this: > > filter: "(all type:/music/single name{full}:"home" /music/track/artist:"Marc Broussard")" `word` is essentially what @ValdimirAlexiev is asking for here, I think. These parameters to control the search were indeed one of the most powerful search features that Andi Vajda incorporated in Freebase Search service when it was operational. TASK DETAIL https://phabricator.wikimedia.org/T266212 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Pintoch, Aklapper, VladimirAlexiev, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T238362: Blazegraph write performance tuning
Thadguidry added a comment. @Gehel Hi Guillaume Isn't the streaming updater work done now by @dcausse ? Is it time for your tuning engineers to revisit some of this or not really? TASK DETAIL https://phabricator.wikimedia.org/T238362 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Sj, Tarrow, dcausse, Igorkim78, Gehel, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T258687: The streaming updater should read its events from multiple DC streams
Thadguidry added a comment. @dcausse Dunno if this might help but could a simple window help or where you use KeyedProcessFunction <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html> on a KeyedStream? If the stream is unkeyed (or initially so), then the other thing might be just finding the patterns in the stream and CEP <https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/cep.html> would help. TASK DETAIL https://phabricator.wikimedia.org/T258687 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, dcausse, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T244590: EPIC: Rework the WDQS updater as an event driven application
Thadguidry added a comment. > - the output of this is a simple event without any data saying: do a diff between rev X and Y, fully delete entity QXYZ, ... Is that supposed to be "data saving" ? > rdf diff generation: materialize the command and fetch the data from wikibase and send it over a RDF stream Will that be an uncompressed RDF stream ? When mentioning streams in any design its always best to say compressed/uncompressed for accuracy. You might have a note on streams; to mention that and any other nuances about them. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, tfmorris, revi, Mholloway, Ladsgroup, Multichill, darthmon_wmde, Iamamz3, Smalyshev, Ottomata, JAllemandou, Aklapper, Zbyszko, Gehel, dcausse, NavinRizwi, CBogen, Akuckartz, DannyS712, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T244590: EPIC: Rework the WDQS updater as an event driven application
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T244590 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: tfmorris, revi, Mholloway, Ladsgroup, Multichill, darthmon_wmde, Iamamz3, Smalyshev, Ottomata, JAllemandou, Aklapper, Zbyszko, Gehel, dcausse, NavinRizwi, CBogen, Akuckartz, DannyS712, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T203643: Sometimes Special:MergeLexemes gives summary on target lexeme, and sometimes not
Thadguidry added a parent task: T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match. . TASK DETAIL https://phabricator.wikimedia.org/T203643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lea_Lacroix_WMDE, Thadguidry Cc: Lea_Lacroix_WMDE, KaMan, Akuckartz, darthmon_wmde, Nandana, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match.
Thadguidry added a subtask: T203643: Sometimes Special:MergeLexemes gives summary on target lexeme, and sometimes not. TASK DETAIL https://phabricator.wikimedia.org/T261049 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: VIGNERON, Thadguidry, Akuckartz, darthmon_wmde, Nandana, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T261049: Propagate the error to UX for merge failure when Lemma's do not exactly match.
Thadguidry created this task. Thadguidry added a project: Wikidata Lexicographical data. Restricted Application added a project: Wikidata. TASK DESCRIPTION **BUG:** Merge dialog shows continuing...but no error is given to the user when trying to merge lemma that do not match exactly. Try to merge these 2 Lexemes: eye to eye https://www.wikidata.org/wiki/Lexeme:L190266 eye-to-eye https://www.wikidata.org/wiki/Lexeme:L190270 No errors are displayed on the UX merge dialog. No JS errors found in Firefox browser console. **SUPPORTING EVIDENCE:** On the mailing list, Nicolas Vigneron responded that: > for a given language the lemma has to exactly match (because otherwise, you would have two lemma with the same languages which is not possible). > So before merging, you can either change the lemma to be the same or change the language of one of the lemma (en-gb ?). **BROWSER:** Firefox latest **TASK:** It would be useful to give the user a simple warning or error message such as the following: > Lemma do not match exactly, so merging is not possible. > For details and workarounds see TASK DETAIL https://phabricator.wikimedia.org/T261049 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: VIGNERON, Thadguidry, Akuckartz, darthmon_wmde, Nandana, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T249868: take into account additional Properties for mapping to schema.org
Thadguidry added a comment. If it helps or is needed, the query that you can use is here: SELECT ?wd ?wdLabel ?corrName ?schema { values (?corr ?corrName) {(wdt:P2235 "superProp") (wdt:P2236 "subProp") (wdt:P1628 "equivProp") (wdt:P1709 "equivClass") (wdt:P2888 "exactMatch")} ?wd ?corr ?schema filter(regex(str(?schema), "schema.org")) SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } order by ?corrName ?schema TASK DETAIL https://phabricator.wikimedia.org/T249868 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Tarrow, Thadguidry, Aklapper, Lydia_Pintscher, Ferdinand0101, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T249868: take into account additional Properties for mapping to schema.org
Thadguidry added a comment. @Lydia_Pintscher Oops! You forgot to include the main one also !!! Equivalent Property P1628 <https://phabricator.wikimedia.org/P1628> :-) TASK DETAIL https://phabricator.wikimedia.org/T249868 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, Aklapper, Lydia_Pintscher, Ferdinand0101, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Hjfocs, Scott_WUaS, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T214884: linking Schemas in statements
Thadguidry added a comment. Is there anything inherently wrong or technically infeasible or undesirable, if an id used 2 letters? ES45 versus E45 <https://phabricator.wikimedia.org/E45> ? TASK DETAIL https://phabricator.wikimedia.org/T214884 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: Thadguidry, WMDE-leszek, Bugreporter, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Ladsgroup, Jheald, Michael, alaa_wmde, ericP, Esc3300, Moebeus, Aklapper, Lydia_Pintscher, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, pdehaye, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Andrawaag, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, Salgo60, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, MisterSynergy, abian, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, LNDDYL, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.
Thadguidry added a comment. Thanks, updated ticket. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, LNDDYL, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.
Thadguidry updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, LNDDYL, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.
Thadguidry added a comment. @dcausse Yes, I mean running a full text search. Fulltext searches are cheap when you index terms in multiple ways. Why would you not want to index terms in multiple ways? Freebase was able to leverage this quite easily with Lucene/Solr indexes and provided great results on its search box on each character typed. Are you hurting for RAM to store the cached inverted indexes or something else with the infra? TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Thadguidry Cc: dcausse, Aklapper, Thadguidry, darthmon_wmde, DannyS712, Nandana, Jony, Prisshahlla, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, LNDDYL, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T214884: linking Schemas in statements
Thadguidry added a comment. TODO: Just wanted to highlight that once decisions are made... please ensure to update the Glossary item <https://www.wikidata.org/wiki/Wikidata:Glossary> ! Currently it reads: > EntitySchema is a special type of Wikidata page containing a document in ShEx format, and related metadata. Although it may have labels, descriptions and aliases similar to items, it is not a type of entity, nor powered by Wikibase. Entities may be validated against an EntitySchema using a tool. As a Data Architect in real life working with databases & entities, I actually appreciate and like the fact that EntitySchema is not a type of entity and as @alaa_wmde states, it decouples concepts and allows flexibility with multiple viewpoints from around the world. It also allows external publishers to express their own views and later link them and validate them. (not every entity/thing has to be stored in Wikibase, but allowing conceptual linking helps the world, so a canonical URI is a "good thing" and agree with @Jheald ) TASK DETAIL https://phabricator.wikimedia.org/T214884 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup, Thadguidry Cc: Thadguidry, WMDE-leszek, Bugreporter, Lea_Lacroix_WMDE, Lucas_Werkmeister_WMDE, Ladsgroup, Jheald, Michael, alaa_wmde, ericP, Esc3300, Moebeus, Aklapper, Lydia_Pintscher, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, pdehaye, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Andrawaag, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, YULdigitalpreservation, LawExplorer, Salgo60, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, MisterSynergy, abian, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata
Thadguidry added a comment. @dbarratt in the Wikibase ontology I could not find those properties in the OWL document returned. Sorry, I'm getting caught up with your schema layouts as fast as I can :-) I expected my parser to retrieve information about their description, range, domain. I do see the class "Statement" however. TASK DETAIL https://phabricator.wikimedia.org/T207168 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: cscott, Thadguidry Cc: Thadguidry, Addshore, WMDE-leszek, Pablo-WMDE, dbarratt, abian, _jensen, Christopher, Salgo60, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, Smalyshev, Lea_Lacroix_WMDE, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, rosalieper, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T207168: Provide JSON-LD support for Wikidata
Thadguidry added a comment. Something is amiss with these...not found. "wikibase": "http://wikiba.se/ontology#";, "statements": { "@id": "wikibase:statements" }, "identifiers": { "@id": "wikibase:identifiers" }, "sitelinks": { "@id": "wikibase:sitelinks" }, TASK DETAIL https://phabricator.wikimedia.org/T207168 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: cscott, Thadguidry Cc: Thadguidry, Addshore, WMDE-leszek, Pablo-WMDE, dbarratt, abian, _jensen, Christopher, Salgo60, Lydia_Pintscher, Denny, Abraham, AnjaJentzsch, Aklapper, intracer, Liuxinyu970226, cscott, PokestarFan, gerritbot, Prtksxna, Lucas_Werkmeister_WMDE, Tpt, thiemowmde, Multichill, Eroux108, Realworldobject, Smalyshev, Lea_Lacroix_WMDE, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, rosalieper, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs