[Wikidata-bugs] [Maniphest] T229655: bad interaction of lang() with wikibase:label
Igorkim78 removed Igorkim78 as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T229655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T233204: Mixup of unicode characters in Query Service
Igorkim78 added a comment. If you will consider changing collator configuration, note, that collator type should NOT be changed from the default value ICU: com.bigdata.btree.keys.KeyBuilder.collator=ICU There are collator type options JDK and ASCII, but both would not be usable, as JDK is basically result in the same comparison as ICU uses, but generate much larger keys; and ASCII just assumes the source text to be ASCII and completely drops Unicode support. As Stas mentioned Blazegraph uses ICU default collator strength. Which depends on locale of the literal, but is Tertiary in most cases (that's why it might behave differently if lang tag is specified): com.ibm.icu.text.Collator#getInstance(java.util.Locale) You have 4 strength options besides default Tertiary: Ref: http://userguide.icu-project.org/collation/concepts#TOC-Comparison-Levels Primary Level: Typically, this is used to denote differences between base characters (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are divided into different sections by base character. This is also called the level-1 strength. Secondary Level: Accents in the characters are considered secondary differences (for example, "as" < "às" < "at"). Other differences between letters can also be considered secondary differences, depending on the language. A secondary difference is ignored when there is a primary difference anywhere in the strings. This is also called the level-2 strength. Note: In some languages (such as Danish), certain accented letters are considered to be separate base characters. In most languages, however, an accented letter only has a secondary difference from the unaccented version of that letter. Tertiary Level (Default in most cases): Upper and lower case differences in characters are distinguished at the tertiary level (for example, "ao" < "Ao" < "aò"). In addition, a variant of a letter differs from the base form on the tertiary level (such as "A" and "Ⓐ"). Another example is the difference between large and small Kana. A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings. This is also called the level-3 strength. Quaternary Level: When punctuation is ignored (see Ignoring Punctuations (§)) at level 1-3, an additional level can be used to distinguish words with and without punctuation (for example, "ab" < "a-b" < "aB"). This difference is ignored when there is a primary, secondary or tertiary difference. This is also known as the level-4 strength. The quaternary level should only be used if ignoring punctuation is required or when processing Japanese text (see Hiragana processing (§)). Identical Level: When all other levels are equal, the identical level is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared at this level, just in case there is no difference at levels 1-4 . For example, Hebrew cantillation marks are only distinguished at this level. This level should be used sparingly, as only code point values differences between two strings is an extremely rare occurrence. Using this level substantially decreases the performance for both incremental comparison and sort key generation (as well as increasing the sort key length). It is also known as level 5 strength. While Quaternary level might be sufficient for 'Abeŀlio' if dot is a punctuation here, but given the necessity to distinguish between ⑫ and ⓬, the only option to consider is Identical. The strength could be adjusted by specifying RWStore.properties parameter: com.bigdata.btree.keys.KeyBuilder.collator.strength=Identical It will not update configuration for existing journals, you would need full reload, and watch out for the size of the resulting journal, it will be larger, but it's hard to estimate how much. TASK DETAIL https://phabricator.wikimedia.org/T233204 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Nikki, CamelCaseNick, Smalyshev, Aklapper, Lucas_Werkmeister_WMDE, Igorkim78, Gehel, Lea_Lacroix_WMDE, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph
Igorkim78 added a comment. @Aklapper , Thank you! Fixed the commit message. TASK DETAIL https://phabricator.wikimedia.org/T236663 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Addshore, Aklapper, Igorkim78, Gehel, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, AramBakir, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph
Igorkim78 added a comment. Changeset is https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/532373/ TASK DETAIL https://phabricator.wikimedia.org/T236663 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Addshore, Aklapper, Igorkim78, Gehel, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label
Igorkim78 added a comment. The issue caused by a combination of Service node producing variable ?coDescription, which is not explicitely defined in the main query, so optimizers assume this variable not bound and do not bother with proper order of the lang function evaluation. Fixing might require reordering optimizers to make wikibase:label produced variables visible to other optimizers, but it kind of tricky because wikibase:label itself depends on results of other optimizers applied at the proper order (as wikibase:label takes a list of variables for processing from the main query). TASK DETAIL https://phabricator.wikimedia.org/T229655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph
Igorkim78 added a comment. Performance measured on dump from 20191202: https://dumps.wikimedia.org/wikidatawiki/entities/20191202/ Baseline tIme to load: 4264m29.914s, 714218864640 bytes Improvements proposed: 1. One-path loading (when data is loaded into SPO index only and POS, OSP are recreated in parallel afterwards). One-path time to load: 1755m57.082s (41.2% of baseline), 402815582208 bytes (56.4% of baseline) Indices recreation: In progress. 2. Data to be loaded is parsed in parallel, creating StatementBuffer instances, which then are queued for load into DB. To be done. TASK DETAIL https://phabricator.wikimedia.org/T236663 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Addshore, Aklapper, Igorkim78, Gehel, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T237089: Create CQS puppet configs by applying query_service module
Igorkim78 added a comment. The configuration changes for SDC data are as follows (note that namespace 'sdc' is used to store RDF data in blazegraph journal, might be changed as needed): - Blazegraph journal config (RWStore.properties) replace the similar configuration for WDQS (search for com.bigdata.namespace.wdq prefix for the parameters to be replaced): # Bump up the branching factor for the lexicon indices on the default kb. com.bigdata.namespace.sdc.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400 com.bigdata.namespace.sdc.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599 com.bigdata.namespace.sdc.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300 # Bump up the branching factor for the statement indices on the default kb. com.bigdata.namespace.sdc.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024 com.bigdata.namespace.sdc.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866 com.bigdata.namespace.sdc.spo.POS.com.bigdata.btree.BTree.branchingFactor=954 com.bigdata.namespace.sdc.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934 Note, that the final configuration should be adjusted for the real production data according to instructions in T232768 <https://phabricator.wikimedia.org/T232768>. - Scripts to run Updater should be called with proper namespace: On data load: ./loadRestAPI.sh -n wdq -d `pwd`/data/split replace by ./loadRestAPI.sh -n sdc -d `pwd`/data/split On single file load: ./loadRestAPI.sh -n wdq -d `pwd`/data/split/wikidump-1.ttl.gz replace by ./loadRestAPI.sh -n sdc -d `pwd`/data/split/wikidump-1.ttl.gz On run updater: ./runUpdate.sh -n wdq replace by ./runUpdate.sh -n sdc On any calls to Blazegraph REST, instead of http://localhost:/bigdata/namespace/wdq/sparql use http://localhost:/bigdata/namespace/sdc/sparql Categories store might need similar changes, but that has to be discussed, if separate categories are needed for production SDC data. TASK DETAIL https://phabricator.wikimedia.org/T237089 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mathew.onipe, Igorkim78 Cc: Aklapper, Igorkim78, Gehel, Liuxinyu970226, Mathew.onipe, darthmon_wmde, Legado_Shulgin, Nandana, JKSTNK, Davinaclare77, Qtn1293, Techguru.pc, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Jdforrester-WMF, Steinsplitter, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs
Igorkim78 added a comment. We need statistics on how many triples use bnode as an object: {code} select ?p (count(*)as ?cnt) { ?s ?p ?o . filter (isBlank(?o)) } group by ?p {code} and as a subject (if any) {code} select ?p (count(*)as ?cnt) { ?s ?p ?o . filter (isBlank(?s)) } group by ?p {code} TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service
Igorkim78 added a comment. output of iostat -x 1 and sudo iotop ? TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service
Igorkim78 added a comment. Are there thread dumps from Blazegraph available? What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? How many per minute/hour? TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service
Igorkim78 added a subtask: T238555: Create endpoint to extract low level data for a list of entity IDs.. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.
Igorkim78 added a parent task: T231411: Test new Updater service. TASK DETAIL https://phabricator.wikimedia.org/T238555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service
Igorkim78 added a subtask: T238557: Allow for logging recently updated entities. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities
Igorkim78 added a parent task: T231411: Test new Updater service. TASK DETAIL https://phabricator.wikimedia.org/T238557 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities
Igorkim78 added a project: Wikidata-Query-Service. Igorkim78 added a comment. Restricted Application added a project: Wikidata. Thanks! Yes it is Wikidata-Query-Service TASK DETAIL https://phabricator.wikimedia.org/T238557 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.
Igorkim78 added a project: Wikidata-Query-Service. Igorkim78 added a comment. Restricted Application added a project: Wikidata. Thanks, yes it is Wikidata-Query-Service TASK DETAIL https://phabricator.wikimedia.org/T238555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238232: blazegraph journal on wdqs1005 is oversized
Igorkim78 added a comment. Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are actually used (~5% of space unused). While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used (~47% of space unused). Most of the %FileWaste or reserved for 8K allocators, but %SlotWaste is also higher than usual for 4k (10 times higher than usual), 2k, 64 (3 times), 320 and 768 allocators (2 times). Slots allocated using 8k allocators are similar on both servers (less than 5% difference) about 5,179M vs 5,431M and only ~1% of them remain in use ~63M. This happens due to updates, for each update parts of the indices related to the changing data have to be copied to a new allocator with changes applied, then the old allocator might be marked as unused and then reused for the later updates after all connections which refer to the commit point linked to the mentioned allocators are closed. But if a commit point could not be released, the allocators are also remain locked. Analyzing Graphana reports, I assume most of the allocators where consumed gradually from Nov 1, 6:00 to Nov 4, 18:00. Given all the above, the conclusion is that something (most probably some intentionally or unintentionally unclosed connection) was blocking releasing allocators for 3.5 days preventing their reuse, thus updates had to allocate new allocators, then the commit point was released and the locked allocators are also released, but they could not be removed from file, just increasing sparse space. TASK DETAIL https://phabricator.wikimedia.org/T238232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Mathew.onipe, Igorkim78, Gehel, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Edited] T234968: Measure performance impact of code optimization and/or blazegraph settings on real traffic data
Igorkim78 updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T234968 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: JAllemandou, Mathew.onipe, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Cirdan, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T101013: Log Wikidata Query Service queries to the event gate infrastructure
Igorkim78 added a comment. Added link to the task T236251 <https://phabricator.wikimedia.org/T236251>: Add header returning time millis to first solution similar to TTFB measured in Blazegraph. The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzing long-running queries and also comparing queries performance. If the time reported by Blazegraph is significantly less than total time of the query execution, it might be caused by: 1. Total result is very large one, and it has consumed much time on serialization/deserialization (that is basically OK situation, if the number of results are large) 2. Some connectivity issues, over network and/or inter-process. In this case the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but total query time vary over time. 3. Query might be very unselective, but additional constraints filter out many potential solutions, so the first solution is computed fast but to collect all the asked results it takes much time. Such queries are subject to analysis and might need fixing in the Blazegraph code or data layout. The header X-FIRST-SOLUTION-MILLIS will return number of milliseconds spent before first solution is available to be written in the response payload (that is the last time when the headers might be added to the result). The value shell not exceed query timeouts sent on jetty and query level. TASK DETAIL https://phabricator.wikimedia.org/T101013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, Igorkim78 Cc: Igorkim78, JAllemandou, Ottomata, Smalyshev, Deskana, Aklapper, 4748kitoko, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, holger.knust, Meekrab2012, joker88john, ET4Eva, DannyS712, CucyNoiD, Nandana, NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, mobrovac, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T235540: SPARQL query causes StackOverflowError and fails to execute
Igorkim78 added a comment. The LabelService optimizer was fixed (so it will not throw NPEs) this August, by reusing Blazegraph core utility com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an introspection on variables used in filters and other clauses, so LabelService call placement could be properly adjusted, this introspection seems to come into infinite loop over the AST tree. Vars reuse to label aggregation after the original var is a common practice, so, yes it should be fixed. Looking on the workaround to extract referenced vars without catching into the infinite loop. TASK DETAIL https://phabricator.wikimedia.org/T235540 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Mathew.onipe, Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, Aklapper, Evilricepuddin, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T231411: Test new Updater service
Igorkim78 claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Cirdan, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T227365: WDQS/Blazegraph data loading has timeout
Igorkim78 added a comment. There is a context param queryTimeout set to 10 minutes in web.xml, which is applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you might apply it locally (or just edit web.xml file) to resolve your issue, as the change has not been applied to the WDQS master due to this timeout is system-wide and extending it might result in unexpected consumption of resources (this timeout will be also applied to queries, including very heave ones, thus allowing them running much longer before generating timeout). TASK DETAIL https://phabricator.wikimedia.org/T227365 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Doqume, Aklapper, Igorkim78, Smalyshev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service
Igorkim78 added a comment. These characters are indeed mapped to the same term in the DB. SELECT ( ConstantNode(TermId(1415304733L)[⓬]) AS VarNode(negativeCircled) ) ( ConstantNode(TermId(1415304733L)[⑫]) AS VarNode(circled) ) Blazegraph uses ICU collation as default key builder implementation. The characters are indeed seems very similar, thus ICU might decide to mix them up. The behavior might be fixed but might result in many side effects, especially with complex unicode sequences, for example diacritics, and should be carefully considered. TASK DETAIL https://phabricator.wikimedia.org/T233204 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Aklapper, Lucas_Werkmeister_WMDE, Igorkim78, Gehel, Lea_Lacroix_WMDE, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service
Igorkim78 added a comment. Differences in bnodes might be tolerated with additional replacement. The cleanup stage could be merged with initial sed+sort zcat wikidata.jnl.1.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep -v http://wikiba.se/ontology#timestamp | sort | gzip > wikidata.jnl.1.sorted.gz zcat wikidata.jnl.2.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep -v http://wikiba.se/ontology#timestamp | sort | gzip > wikidata.jnl.2.sorted.gz then compare will be just comm -3 <(zcat wikidata.jnl.1.sorted.gz) <(zcat wikidata.jnl.2.sorted.gz) > wikidata.jnl.diff TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, Abbe98, Smalyshev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Cirdan, Jonas, Xmlizer, Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label
Igorkim78 added a comment. Looking at query exetution plans, ProjectionOp for the query with lang() for coDescription got arranged prior to materialization of coDescription, so it (along with its lang) has not got the way to the projection. The reason for such behavior needs some more research. Will update on that. TASK DETAIL https://phabricator.wikimedia.org/T229655 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message
Igorkim78 added a comment. Fixed optional support and added testcase for that code path. Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could be reordered prior to any clauses placed at the bottom of the query, we need to consider only inbound variables, so they would be available for the service call, and all outbound vars available for the latter filters and other clauses. TASK DETAIL https://phabricator.wikimedia.org/T175840 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message
Igorkim78 added a comment. The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps: - first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process. - then EmptyLabelServiceOptimizer adds statement patterns for resolutions. - and then additional optimizer step rearranges LabelService to the latest possible step before any clauses, which might use the variables bound by LabelService. All tests in LabelServiceUnitTest (including new specific testcase from this bug) are passing, but I think it might take some additional tuning to properly support all 'real-life' usage scenarios. For example FILTER clauses, including those which are written above service calls and binds. These might also need additional rearrangement. I have not applied them yet, as this might become a waterfall, which will rearrange the clauses to much. TASK DETAIL https://phabricator.wikimedia.org/T175840 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected
Igorkim78 added a comment. The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService clause is unavailable. parent is defined in GroupMemberNodeBase as IGroupNode, so it could not give us a reference to ServiceNode to propagate to SubqueryBase, as both of them are not IGroupNode descendants. So to get back-references from nested statement patterns, clauses etc. to SubqueryBase which we need to extract it's projection, we would need to introduce proper annotations and propagate them through different types of nesting Nodes as actual service clause might be enclosed with for example UnionNode, etc. Assignment of annotation can be done in com.bigdata.rdf.sail.sparql.BigdataExprBuilder.handleWhereClause(ASTQuery, QueryBase) as queryRoot.setWhereClause(ret); if (queryRoot instanceof SubqueryBase) { ret.annotations().put(QueryBase.Annotations.PROJECTION, queryRoot.getProjection()); } Then we could use it in org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as if (!foundArg) { EmptyLabelServiceOptimizer.this.addResolutions(ctx, g, (ProjectionNode) service.getParent().annotations().get(QueryBase.Annotations.PROJECTION)); } But this would require changing blazegraph core and also additional handling should be applied to properly propagate annotation to nested clauses. There is another option though, org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) is already traversing the whole tree to process LabelService clauses, so we could hangle projection annotation at this point: join.getChildren(SubqueryBase.class).stream().forEach(node->{ SubqueryBase subqueryBase = (SubqueryBase)node; JoinGroupNode whereClause = (JoinGroupNode)subqueryBase.getWhereClause(); whereClause.setProperty(QueryBase.Annotations.PROJECTION, subqueryBase.getProjection()); }); Though here we might also need some additional handling for LabelService inside of nested clauses. Created changeset https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/508725/ I had to apply the same changes to pom as T213375 <https://phabricator.wikimedia.org/T213375> to link with bigdata-rdf-test. TASK DETAIL https://phabricator.wikimedia.org/T153353 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Smalyshev, hoo, darthmon_wmde, alaa_wmde, joker88john, ET4Eva, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs
Igorkim78 added a comment. Additionally tested configuration option with only Raw records disabled, comparing to original baseline: - takes 1.7% more time, produces journal of 9.2% less bytes, 77% less allocations with their overall size 38.9% less, though the are 2.9% more blobs allocations with 7.5% more size. This might be an option to consider, even without value and reference URIs inlining. TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs
Igorkim78 added a comment. Configuration options are assigned in RWStore.properties. Particular options are: - Inlined Value and Reference URIs: > com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V001 - Raw records support disabled: > com.bigdata.rdf.store.AbstractTripleStore.enableRawRecordsSupport=false - Inlining of short text literals, Max Length has to be assinged as a threshold to inline literals: > com.bigdata.rdf.store.AbstractTripleStore.inlineTextLiterals=true > com.bigdata.rdf.store.AbstractTripleStore.maxInlineTextLength=40 A combination of parameters might be applied. The most promising combination is inlining of Value and Reference URIs and disabling Raw Records support: > com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V001 > com.bigdata.rdf.store.AbstractTripleStore.enableRawRecordsSupport=false TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected
Igorkim78 added a comment. This seems to be optimizers order problem. CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions: while running **//ASTDeferredIVResolution//** while running **//com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer//** then while running **//ConditionalRoutingOp for ChunkedRunningQuery//** So, finally, the solution got discarded in com.bigdata.rdf.internal.constraints.SPARQLConstraint.accept(IBindingSet) And LabelService has not got called at all. On the other hand, if langLabel uncommended on the outer projection, LabelService is called and langLabel is already bound while calling SPARQLConstraint.accept. The difference in query execution plans is that on successful one, additional statement is added to LabelService clause: > SERVICE http://wikiba.se/ontology#label])> { >JoinGroupNode { > StatementPatternNode(ConstantNode(TermId(0U)[http://www.bigdata.com/rdf#serviceParam]), ConstantNode(TermId(0U)[http://wikiba.se/ontology#language]), ConstantNode(TermId(0L)[en])) [scope=DEFAULT_CONTEXTS] > StatementPatternNode(VarNode(lang), ConstantNode(Vocab(74)[http://www.w3.org/2000/01/rdf-schema#label]), VarNode(langLabel)) [scope=DEFAULT_CONTEXTS] # <<< Missing statement pattern >} > } > If it is added manually, the query succedes: > SELECT ?lang #?langLabel > WHERE { > { > SELECT ?lang ?langLabel WHERE { > BIND(wd:Q154755 AS ?lang) > SERVICE wikibase:label { > bd:serviceParam wikibase:language "en" . > ?lang rdfs:label ?langLabel . > } > } > } > FILTER("Ada"@en = ?langLabel) . > } TASK DETAIL https://phabricator.wikimedia.org/T153353 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Smalyshev, hoo, alaa_wmde, ET4Eva, Nandana, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs
Igorkim78 added a comment. Complete test logs attached F28854747: logs.zip <https://phabricator.wikimedia.org/F28854747> TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs
Igorkim78 added a comment. Load performance for the tested configurations on isolated environment (i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO) F28854691: Load performance.png <https://phabricator.wikimedia.org/F28854691> Query performance on simple queries (select * from {?s ?p ?o .} with ?s bound to random subject URI) does not show any significant changes for the tested journal configurations. Probably more complex query mix should be applied for the journals to see the difference. TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T213375: Inline value and reference URIs
Igorkim78 added a comment. Attached results of the load 100 ttl.gz files with different configurations F28854613: Results.xls <https://phabricator.wikimedia.org/F28854613> - original baseline (commit blazegraph 895a4f3bd003ddb4b1f31257f642ca3616bca79b <https://phabricator.wikimedia.org/rWDQR895a4f3bd003ddb4b1f31257f642ca3616bca79b>, rdf 4245b2a5bc0c7d4b369a43ba512b5e537dac07a4 <https://phabricator.wikimedia.org/rWDQR4245b2a5bc0c7d4b369a43ba512b5e537dac07a4>) - reference URIs inlining, - reference URIs inlining, raw records disabled per T213210 <https://phabricator.wikimedia.org/T213210> - reference URIs inlining, raw records disabled, INLINE_TEXT_LITERALS for short strings per T213210 <https://phabricator.wikimedia.org/T213210> Conclusions, comparing to original baseline: - Inlining of reference and value URIs takes 22% more time, produces journal of 10% more bytes, 1.7% less allocations but their overall size is 8.6% more, though the are 58% more blobs allocations with 66% more size. - Inlining of reference and value URIs with raw records disabled takes 20% more time, produces journal of the same size, 77% less allocations with overall size 29% less, though the are 61% more blobs allocations with 73% more size. - Inlining of reference and value URIs and literals (less than 40chars) with raw records disabled takes 66% more time, produces journal of 21% more bytes, 75% less allocations with overall size 2% less, though the are 234% more blobs allocations with 382% more size. Result: Configuration Option of Inlining of reference and value URIs with raw records disabled might be considered to reduce allocations count, but all tested configurations results in more allocations for BLOBs. TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Claimed] T213375: Inline value and reference URIs
Igorkim78 claimed this task. Igorkim78 added a comment. Changeset created to support reference URIs inlining: https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642 Baseline collected for performance test: Data files loaded: 100 ttl gz files into an empty journal Total size of ttl.gz files: 7.9GB Number of triples in the journal: 1,114,293,494 Size of the journal: 116,122,910,720 bytes (100GB) Subjects in the journal: 135,711,041 Reference subjects in the journal: 7,206,942 (5.3% of all subjects) Load performance: Load time: 156278 seconds (43 hours) Load performance Average: 7122 mutations per second Load performance Stabilized (last 10 files): 4170 mutations per second Query performance measured for simple query select * {?s ?p ?o } with ?s bound to random subject from two sets, reference URIs and all other URIs except statement URIs: For reference URIs: Stabilized query performance after ~150K queries: 80 qps with average of 4 rows per result set Normalized query performance 320 rows per second. For other URIs: Stabilized query performance after ~150K queries: 70 qps with average of 5 rows per result set Normalized query performance 350 rows per second. In progress: reload journal with configurations: - reference URIs inlining, - reference URIs inlining, raw records disabled per T213210 <https://phabricator.wikimedia.org/T213210> - reference URIs inlining, INLINE_TEXT_LITERALS for short strings per T213210 <https://phabricator.wikimedia.org/T213210> and compare results with the baseline. TASK DETAIL https://phabricator.wikimedia.org/T213375 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78 Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs