[Wikidata-bugs] [Maniphest] T229655: bad interaction of lang() with wikibase:label

2022-02-07 Thread Igorkim78
Igorkim78 removed Igorkim78 as the assignee of this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T229655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, Invadibot, MPhamWMF, 
maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T233204: Mixup of unicode characters in Query Service

2020-11-03 Thread Igorkim78
Igorkim78 added a comment.


  If you will consider changing collator configuration, note, that collator 
type should NOT be changed from the default value ICU:
  com.bigdata.btree.keys.KeyBuilder.collator=ICU
  There are collator type options JDK and ASCII, but both would not be usable, 
as JDK is basically result in the same comparison as ICU uses, but generate 
much larger keys; and ASCII just assumes the source text to be ASCII and 
completely drops Unicode support.
  
  As Stas mentioned Blazegraph uses ICU default collator strength. Which 
depends on locale of the literal, but is Tertiary in most cases (that's why it 
might behave differently if lang tag is specified):
  com.ibm.icu.text.Collator#getInstance(java.util.Locale)
  
  You have 4 strength options besides default Tertiary:
  Ref: http://userguide.icu-project.org/collation/concepts#TOC-Comparison-Levels
  
  Primary Level: Typically, this is used to denote differences between base 
characters (for example, "a" < "b"). It is the strongest difference. For 
example, dictionaries are divided into different sections by base character. 
This is also called the level-1 strength.
  
  Secondary Level: Accents in the characters are considered secondary 
differences (for example, "as" < "às" < "at"). Other differences between 
letters can also be considered secondary differences, depending on the 
language. A secondary difference is ignored when there is a primary difference 
anywhere in the strings. This is also called the level-2 strength.
  Note: In some languages (such as Danish), certain accented letters are 
considered to be separate base characters. In most languages, however, an 
accented letter only has a secondary difference from the unaccented version of 
that letter.
  
  Tertiary Level (Default in most cases): Upper and lower case differences in 
characters are distinguished at the tertiary level (for example, "ao" < "Ao" < 
"aò"). In addition, a variant of a letter differs from the base form on the 
tertiary level (such as "A" and "Ⓐ"). Another example is the difference between 
large and small Kana. A tertiary difference is ignored when there is a primary 
or secondary difference anywhere in the strings. This is also called the 
level-3 strength.
  
  Quaternary Level: When punctuation is ignored (see Ignoring Punctuations (§)) 
at level 1-3, an additional level can be used to distinguish words with and 
without punctuation (for example, "ab" < "a-b" < "aB"). This difference is 
ignored when there is a primary, secondary or tertiary difference. This is also 
known as the level-4 strength. The quaternary level should only be used if 
ignoring punctuation is required or when processing Japanese text (see Hiragana 
processing (§)).
  
  Identical Level: When all other levels are equal, the identical level is used 
as a tiebreaker. The Unicode code point values of the NFD form of each string 
are compared at this level, just in case there is no difference at levels 1-4 . 
For example, Hebrew cantillation marks are only distinguished at this level. 
This level should be used sparingly, as only code point values differences 
between two strings is an extremely rare occurrence. Using this level 
substantially decreases the performance for
  both incremental comparison and sort key generation (as well as increasing 
the sort key length). It is also known as level 5 strength.
  
  While Quaternary level might be sufficient for 'Abeŀlio' if dot is a 
punctuation here, but given the necessity to distinguish between ⑫ and ⓬, the 
only option to consider is Identical.
  
  The strength could be adjusted by specifying RWStore.properties parameter:
  com.bigdata.btree.keys.KeyBuilder.collator.strength=Identical
  
  It will not update configuration for existing journals, you would need full 
reload, and watch out for the size of the resulting journal, it will be larger, 
but it's hard to estimate how much.

TASK DETAIL
  https://phabricator.wikimedia.org/T233204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Nikki, CamelCaseNick, Smalyshev, Aklapper, Lucas_Werkmeister_WMDE, 
Igorkim78, Gehel, Lea_Lacroix_WMDE, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-31 Thread Igorkim78
Igorkim78 added a comment.


  @Aklapper , Thank you! Fixed the commit message.

TASK DETAIL
  https://phabricator.wikimedia.org/T236663

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Addshore, Aklapper, Igorkim78, Gehel, Un1tY, Hook696, Daryl-TTMG, 
RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, AramBakir, Meekrab2012, 
joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, 
Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, 
WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-30 Thread Igorkim78
Igorkim78 added a comment.


  Changeset is https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/532373/

TASK DETAIL
  https://phabricator.wikimedia.org/T236663

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Addshore, Aklapper, Igorkim78, Gehel, darthmon_wmde, Nandana, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label

2020-01-30 Thread Igorkim78
Igorkim78 added a comment.


  The issue caused by a combination of Service node producing variable 
?coDescription, which is not explicitely defined in the main query, so 
optimizers assume this variable not bound and do not bother with proper order 
of the lang function evaluation. Fixing might require reordering optimizers to 
make wikibase:label produced variables visible to other optimizers, but it kind 
of tricky because wikibase:label itself depends on results of other optimizers 
applied at the proper order (as wikibase:label takes a list of variables for 
processing from the main query).

TASK DETAIL
  https://phabricator.wikimedia.org/T229655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, darthmon_wmde, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph

2020-01-16 Thread Igorkim78
Igorkim78 added a comment.


  Performance measured on dump from 20191202: 
https://dumps.wikimedia.org/wikidatawiki/entities/20191202/
  Baseline tIme to load: 4264m29.914s, 714218864640 bytes
  
  Improvements proposed:
  
  1. One-path loading (when data is loaded into SPO index only and POS, OSP are 
recreated in parallel afterwards).
  
  One-path time to load: 1755m57.082s (41.2% of baseline), 402815582208 bytes 
(56.4% of baseline)
  Indices recreation: In progress.
  
  2. Data to be loaded is parsed in parallel, creating StatementBuffer 
instances, which then are queued for load into DB.
  
  To be done.

TASK DETAIL
  https://phabricator.wikimedia.org/T236663

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Addshore, Aklapper, Igorkim78, Gehel, darthmon_wmde, Nandana, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T237089: Create CQS puppet configs by applying query_service module

2019-12-23 Thread Igorkim78
Igorkim78 added a comment.


  The configuration changes for SDC data are as follows (note that namespace 
'sdc' is used to store RDF data in blazegraph journal, might be changed as 
needed):
  
  - Blazegraph journal config (RWStore.properties)
  
  replace the similar configuration for WDQS (search for 
com.bigdata.namespace.wdq prefix for the parameters to be replaced):
  
# Bump up the branching factor for the lexicon indices on the default kb.

com.bigdata.namespace.sdc.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400

com.bigdata.namespace.sdc.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599

com.bigdata.namespace.sdc.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300
# Bump up the branching factor for the statement indices on the default kb.

com.bigdata.namespace.sdc.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024

com.bigdata.namespace.sdc.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866

com.bigdata.namespace.sdc.spo.POS.com.bigdata.btree.BTree.branchingFactor=954

com.bigdata.namespace.sdc.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934
  
  Note, that the final configuration should be adjusted for the real production 
data according to instructions in T232768 
<https://phabricator.wikimedia.org/T232768>.
  
  - Scripts to run Updater should be called with proper namespace:
  
  On data load:
  
./loadRestAPI.sh -n wdq -d `pwd`/data/split
  
  replace by
  
./loadRestAPI.sh -n sdc -d `pwd`/data/split
  
  On single file load:
  
./loadRestAPI.sh -n wdq -d `pwd`/data/split/wikidump-1.ttl.gz
  
  replace by
  
./loadRestAPI.sh -n sdc -d `pwd`/data/split/wikidump-1.ttl.gz
  
  On run updater:
  
./runUpdate.sh -n wdq
  
  replace by
  
./runUpdate.sh -n sdc
  
  On any calls to Blazegraph REST, instead of
  
http://localhost:/bigdata/namespace/wdq/sparql
  
  use
  
http://localhost:/bigdata/namespace/sdc/sparql
  
  Categories store might need similar changes, but that has to be discussed, if 
separate categories are needed for production SDC data.

TASK DETAIL
  https://phabricator.wikimedia.org/T237089

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mathew.onipe, Igorkim78
Cc: Aklapper, Igorkim78, Gehel, Liuxinyu970226, Mathew.onipe, darthmon_wmde, 
Legado_Shulgin, Nandana, JKSTNK, Davinaclare77, Qtn1293, Techguru.pc, Lahi, 
PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, Tramullas, Acer, 
LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Scott_WUaS, 
Susannaanas, Wong128hk, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, 
Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, 
Jdforrester-WMF, Steinsplitter, Mbch331, Rxy, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread Igorkim78
Igorkim78 added a comment.


  We need statistics on how many triples use bnode as an object:
  {code}
  select ?p (count(*)as ?cnt) {
  
?s ?p ?o .
filter (isBlank(?o))
  
  }
  group by ?p
  {code}
  and as a subject (if any)
  {code}
  select ?p (count(*)as ?cnt) {
  
?s ?p ?o .
filter (isBlank(?s))
  
  }
  group by ?p
  {code}

TASK DETAIL
  https://phabricator.wikimedia.org/T239414

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, 
darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-11-19 Thread Igorkim78
Igorkim78 added a comment.


  output of
  iostat -x 1
  and 
  sudo iotop
  ?

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, 
EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 
0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, 
Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, 
Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-11-19 Thread Igorkim78
Igorkim78 added a comment.


  Are there thread dumps from Blazegraph available?
  What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? 
How many per minute/hour?

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, 
EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 
0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, 
Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, 
Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service

2019-11-18 Thread Igorkim78
Igorkim78 added a subtask: T238555: Create endpoint to extract low level data 
for a list of entity IDs..

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, 
EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 
0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, 
Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, 
Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.

2019-11-18 Thread Igorkim78
Igorkim78 added a parent task: T231411: Test new Updater service.

TASK DETAIL
  https://phabricator.wikimedia.org/T238555

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T231411: Test new Updater service

2019-11-18 Thread Igorkim78
Igorkim78 added a subtask: T238557: Allow for logging recently updated entities.

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, 
EgonWillighagen, Abbe98, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 
0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, DannyS712, 
CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, 
Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, 
Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities

2019-11-18 Thread Igorkim78
Igorkim78 added a parent task: T231411: Test new Updater service.

TASK DETAIL
  https://phabricator.wikimedia.org/T238557

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T238557: Allow for logging recently updated entities

2019-11-18 Thread Igorkim78
Igorkim78 added a project: Wikidata-Query-Service.
Igorkim78 added a comment.
Restricted Application added a project: Wikidata.


  Thanks! Yes it is Wikidata-Query-Service

TASK DETAIL
  https://phabricator.wikimedia.org/T238557

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Mathew.onipe, Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T238555: Create endpoint to extract low level data for a list of entity IDs.

2019-11-18 Thread Igorkim78
Igorkim78 added a project: Wikidata-Query-Service.
Igorkim78 added a comment.
Restricted Application added a project: Wikidata.


  Thanks, yes it is Wikidata-Query-Service

TASK DETAIL
  https://phabricator.wikimedia.org/T238555

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Gehel, dcausse, Igorkim78, Aklapper, darthmon_wmde, DannyS712, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T238232: blazegraph journal on wdqs1005 is oversized

2019-11-13 Thread Igorkim78
Igorkim78 added a comment.


  Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are 
actually used (~5% of space unused).
  While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used 
(~47% of space unused).
  Most of the %FileWaste or reserved for 8K allocators, but %SlotWaste is also 
higher than usual for 4k (10 times higher than usual), 2k, 64 (3 times), 320 
and 768 allocators (2 times).
  
  Slots allocated using 8k allocators are similar on both servers (less than 5% 
difference) about 5,179M vs 5,431M and only ~1% of them remain in use ~63M. 
This happens due to updates, for each update parts of the indices related to 
the changing data have to be copied to a new allocator with changes applied, 
then the old allocator might be marked as unused and then reused for the later 
updates after all connections which refer to the commit point linked to the 
mentioned allocators are closed. But if a commit point could not be released, 
the allocators are also remain locked.
  
  Analyzing Graphana reports, I assume most of the allocators where consumed 
gradually from Nov 1, 6:00 to Nov 4, 18:00.
  
  Given all the above, the conclusion is that something (most probably some 
intentionally or unintentionally unclosed connection) was blocking releasing 
allocators for 3.5 days preventing their reuse, thus updates had to allocate 
new allocators, then the commit point was released and the locked allocators 
are also released, but they could not be removed from file, just increasing 
sparse space.

TASK DETAIL
  https://phabricator.wikimedia.org/T238232

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Mathew.onipe, Igorkim78, Gehel, Aklapper, darthmon_wmde, DannyS712, 
Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T234968: Measure performance impact of code optimization and/or blazegraph settings on real traffic data

2019-10-23 Thread Igorkim78
Igorkim78 updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T234968

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: JAllemandou, Mathew.onipe, dcausse, Igorkim78, Aklapper, darthmon_wmde, 
DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Cirdan, Jonas, 
Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-10-23 Thread Igorkim78
Igorkim78 added a comment.


  Added link to the task T236251 <https://phabricator.wikimedia.org/T236251>: 
Add header returning time millis to first solution similar to TTFB measured in 
Blazegraph.
  The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while 
analyzing long-running queries and also comparing queries performance. If the 
time reported by Blazegraph is significantly less than total time of the query 
execution, it might be caused by:
  
  1. Total result is very large one, and it has consumed much time on 
serialization/deserialization (that is basically OK situation, if the number of 
results are large)
  2. Some connectivity issues, over network and/or inter-process. In this case 
the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but 
total query time vary over time.
  3. Query might be very unselective, but additional constraints filter out 
many potential solutions, so the first solution is computed fast but to collect 
all the asked results it takes much time. Such queries are subject to analysis 
and might need fixing in the Blazegraph code or data layout.
  
  The header X-FIRST-SOLUTION-MILLIS will return number of milliseconds spent 
before first solution is available to be written in the response payload (that 
is the last time when the headers might be added to the result). The value 
shell not exceed query timeouts sent on jetty and query level.

TASK DETAIL
  https://phabricator.wikimedia.org/T101013

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, Igorkim78
Cc: Igorkim78, JAllemandou, Ottomata, Smalyshev, Deskana, Aklapper, 4748kitoko, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, 
holger.knust, Meekrab2012, joker88john, ET4Eva, DannyS712, CucyNoiD, Nandana, 
NebulousIris, Akovalyov, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Avner, Lewizho99, 
Maathavan, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, Xmlizer, 
mobrovac, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
GWicke, Manybubbles, Mbch331, jeremyb
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T235540: SPARQL query causes StackOverflowError and fails to execute

2019-10-16 Thread Igorkim78
Igorkim78 added a comment.


  The LabelService optimizer was fixed (so it will not throw NPEs) this August, 
by reusing Blazegraph core utility 
com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an 
introspection on variables used in filters and other clauses, so LabelService 
call placement could be properly adjusted, this introspection seems to come 
into infinite loop over the AST tree. Vars reuse to label aggregation after the 
original var is a common practice, so, yes it should be fixed. Looking on the 
workaround to extract referenced vars without catching into the infinite loop.

TASK DETAIL
  https://phabricator.wikimedia.org/T235540

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Mathew.onipe, Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, Aklapper, 
Evilricepuddin, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T231411: Test new Updater service

2019-10-09 Thread Igorkim78
Igorkim78 claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, 
Abbe98, Smalyshev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Cirdan, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T227365: WDQS/Blazegraph data loading has timeout

2019-10-07 Thread Igorkim78
Igorkim78 added a comment.


  There is a context param queryTimeout set to 10 minutes in web.xml, which is 
applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x 
times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you 
might apply it locally (or just edit web.xml file) to resolve your issue, as 
the change has not been applied to the WDQS master due to this timeout is 
system-wide and extending it might result in unexpected consumption of 
resources (this timeout will be also applied to queries, including very heave 
ones, thus allowing them running much longer before generating timeout).

TASK DETAIL
  https://phabricator.wikimedia.org/T227365

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Doqume, Aklapper, Igorkim78, Smalyshev, darthmon_wmde, DannyS712, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Krenair
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-30 Thread Igorkim78
Igorkim78 added a comment.


  These characters are indeed mapped to the same term in the DB.
  
SELECT ( ConstantNode(TermId(1415304733L)[⓬]) AS VarNode(negativeCircled) ) 
( ConstantNode(TermId(1415304733L)[⑫]) AS VarNode(circled) )
  
  Blazegraph uses ICU collation as default key builder implementation.
  The characters are indeed seems very similar, thus ICU might decide to mix 
them up.
  The behavior might be fixed but might result in many side effects, especially 
with complex unicode sequences, for example diacritics, and should be carefully 
considered.

TASK DETAIL
  https://phabricator.wikimedia.org/T233204

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Aklapper, Lucas_Werkmeister_WMDE, Igorkim78, Gehel, 
Lea_Lacroix_WMDE, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T231411: Test new Updater service

2019-08-29 Thread Igorkim78
Igorkim78 added a comment.


  Differences in bnodes might be tolerated with additional replacement. The 
cleanup stage could be merged with initial sed+sort
  
zcat wikidata.jnl.1.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep 
-v http://wikiba.se/ontology#timestamp | sort | gzip > wikidata.jnl.1.sorted.gz
zcat wikidata.jnl.2.data.gz | sed 's/ : .*$//;s/_:t[^,>]*/bnode/g' | grep 
-v http://wikiba.se/ontology#timestamp | sort | gzip > wikidata.jnl.2.sorted.gz
  
  then compare will be just
  
comm -3 <(zcat wikidata.jnl.1.sorted.gz) <(zcat wikidata.jnl.2.sorted.gz) > 
wikidata.jnl.diff

TASK DETAIL
  https://phabricator.wikimedia.org/T231411

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen, EgonWillighagen, 
Abbe98, Smalyshev, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Cirdan, Jonas, Xmlizer, 
Volker_E, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, 
Dinoguy1000, Manybubbles, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T229655: bad interaction of lang() with wikibase:label

2019-08-02 Thread Igorkim78
Igorkim78 added a comment.


  Looking at query exetution plans, ProjectionOp for the query with lang() for 
coDescription got arranged prior to materialization of coDescription, so it 
(along with its lang) has not got the way to the projection. The reason for 
such behavior needs some more research. Will update on that.

TASK DETAIL
  https://phabricator.wikimedia.org/T229655

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Igorkim78, Aklapper, VladimirAlexiev, darthmon_wmde, DannyS712, 
Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message

2019-07-01 Thread Igorkim78
Igorkim78 added a comment.


  Fixed optional support and added testcase for that code path.
  Service projectedVars actually include both inbound and outbound variables 
(those which are params for the service and those which are produced by labels 
lookup. But for the check if service node could be reordered prior to any 
clauses placed at the bottom of the query, we need to consider only inbound 
variables, so they would be available for the service call, and all outbound 
vars available for the latter filters and other clauses.

TASK DETAIL
  https://phabricator.wikimedia.org/T175840

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, ET4Eva, 
Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Avner, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T175840: Using label service twice in one query results in obscure error message

2019-06-25 Thread Igorkim78
Igorkim78 added a comment.


  The idea for the change is to replace runLast hint with more complicated 
logic. So there are 3 steps:
  
  - first 'most probable optimal' placement to allow for 
EmptyLabelServiceOptimizer to see the variables to process.
  - then EmptyLabelServiceOptimizer adds statement patterns for resolutions.
  - and then additional optimizer step rearranges LabelService to the latest 
possible step before any clauses, which might use the variables bound by 
LabelService.
  
  All tests in LabelServiceUnitTest (including new specific testcase from this 
bug) are passing, but I think it might take some additional tuning to properly 
support all 'real-life' usage scenarios. For example FILTER clauses, including 
those which are written above service calls and binds. These might also need 
additional rearrangement.
  I have not applied them yet, as this might become a waterfall, which will 
rearrange the clauses to much.

TASK DETAIL
  https://phabricator.wikimedia.org/T175840

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Smalyshev, Lucas_Werkmeister_WMDE, Aklapper, darthmon_wmde, ET4Eva, 
Nandana, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Avner, Gehel, _jensen, rosalieper, Cirdan, Jonas, FloNight, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected

2019-05-07 Thread Igorkim78
Igorkim78 added a comment.


  The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, 
StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection 
from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping 
statement pattern of the LabelService clause is unavailable.
  
  parent is defined in GroupMemberNodeBase as IGroupNode, so 
it could not give us a reference to ServiceNode to propagate to SubqueryBase, 
as both of them are not IGroupNode descendants.
  
  So to get back-references from nested statement patterns, clauses etc. to 
SubqueryBase which we need to extract it's projection, we would need to 
introduce proper annotations and propagate them through different types of 
nesting Nodes as actual service clause might be enclosed with for example 
UnionNode, etc.
  
  Assignment of annotation can be done in 
com.bigdata.rdf.sail.sparql.BigdataExprBuilder.handleWhereClause(ASTQuery, 
QueryBase) as
  
queryRoot.setWhereClause(ret);
if (queryRoot instanceof SubqueryBase) {
ret.annotations().put(QueryBase.Annotations.PROJECTION, 
queryRoot.getProjection());
}
  
  Then we could use it in 
org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext,
 StaticAnalysis, IBindingSet[], JoinGroupNode) as
  
if (!foundArg) {
EmptyLabelServiceOptimizer.this.addResolutions(ctx, g, (ProjectionNode) 
service.getParent().annotations().get(QueryBase.Annotations.PROJECTION));
}
  
  But this would require changing blazegraph core and also additional handling 
should be applied to properly propagate annotation to nested clauses.
  
  There is another option though, 
org.wikidata.query.rdf.blazegraph.label.EmptyLabelServiceOptimizer.optimizeJoinGroup(AST2BOpContext,
 StaticAnalysis, IBindingSet[], JoinGroupNode)
  is already traversing the whole tree to process LabelService clauses,  so we 
could hangle projection annotation at this point:
  
join.getChildren(SubqueryBase.class).stream().forEach(node->{
SubqueryBase subqueryBase = (SubqueryBase)node;
JoinGroupNode whereClause = 
(JoinGroupNode)subqueryBase.getWhereClause();
whereClause.setProperty(QueryBase.Annotations.PROJECTION, 
subqueryBase.getProjection());
});
  
  Though here we might also need some additional handling for LabelService 
inside of nested clauses.
  
  Created changeset 
https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/508725/
  I had to apply the same changes to pom as T213375 
<https://phabricator.wikimedia.org/T213375> to link with bigdata-rdf-test.

TASK DETAIL
  https://phabricator.wikimedia.org/T153353

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Smalyshev, hoo, darthmon_wmde, alaa_wmde, joker88john, 
ET4Eva, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, 
Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, 
WSH1906, Avner, Lewizho99, Maathavan, Gehel, _jensen, rosalieper, Jonas, 
FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331, Krenair
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-05-06 Thread Igorkim78
Igorkim78 added a comment.


  Additionally tested configuration option with only Raw records disabled, 
comparing to original baseline:
  
  - takes 1.7% more time, produces journal of 9.2% less bytes, 77% less 
allocations with their overall size 38.9% less, though the are 2.9% more blobs 
allocations with 7.5% more size.
  
  This might be an option to consider, even without value and reference URIs 
inlining.

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-05-06 Thread Igorkim78
Igorkim78 added a comment.


  Configuration options are assigned in RWStore.properties. Particular options 
are:
  
  - Inlined Value and Reference URIs:
  
  > 
com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V001
  
  
  
  - Raw records support disabled:
  
  > com.bigdata.rdf.store.AbstractTripleStore.enableRawRecordsSupport=false
  
  
  
  - Inlining of short text literals, Max Length has to be assinged as a 
threshold to inline literals:
  
  > com.bigdata.rdf.store.AbstractTripleStore.inlineTextLiterals=true
  >  com.bigdata.rdf.store.AbstractTripleStore.maxInlineTextLength=40
  
  A combination of parameters might be applied. The most promising combination 
is inlining of Value and Reference URIs and disabling Raw Records support:
  
  > 
com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory=org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory$V001
  >  com.bigdata.rdf.store.AbstractTripleStore.enableRawRecordsSupport=false

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected

2019-05-06 Thread Igorkim78
Igorkim78 added a comment.


  This seems to be optimizers order problem.
  CompareBOp executes to check if "Ada"@en equals to ?langLabel several times 
but the ?langLabel is not bound on all occasions:
  while running **//ASTDeferredIVResolution//**
  while running 
**//com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer//**
  then while running **//ConditionalRoutingOp for ChunkedRunningQuery//**
  
  So, finally, the solution got discarded in 
  com.bigdata.rdf.internal.constraints.SPARQLConstraint.accept(IBindingSet)
  And LabelService has not got called at all.
  
  On the other hand, if langLabel uncommended on the outer projection, 
LabelService is called 
  and langLabel is already bound while calling SPARQLConstraint.accept.
  
  The difference in query execution plans is that on successful one, additional 
statement is added to LabelService clause:
  
  >   SERVICE http://wikiba.se/ontology#label])> {
  >JoinGroupNode {
  >  
StatementPatternNode(ConstantNode(TermId(0U)[http://www.bigdata.com/rdf#serviceParam]),
 ConstantNode(TermId(0U)[http://wikiba.se/ontology#language]), 
ConstantNode(TermId(0L)[en])) [scope=DEFAULT_CONTEXTS]
  >  StatementPatternNode(VarNode(lang), 
ConstantNode(Vocab(74)[http://www.w3.org/2000/01/rdf-schema#label]), 
VarNode(langLabel)) [scope=DEFAULT_CONTEXTS] # <<< Missing statement pattern
  >}
  >  }
  >
  
  If it is added manually, the query succedes:
  
  >   SELECT ?lang #?langLabel
  >   WHERE {
  > {
  > SELECT ?lang ?langLabel WHERE {
  > BIND(wd:Q154755 AS ?lang)
  > SERVICE wikibase:label {
  > bd:serviceParam wikibase:language "en" .
  > ?lang rdfs:label ?langLabel .
  >   }
  > }
  > }
  > FILTER("Ada"@en = ?langLabel) .
  > }

TASK DETAIL
  https://phabricator.wikimedia.org/T153353

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Smalyshev, hoo, alaa_wmde, ET4Eva, Nandana, Lahi, 
Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, Avner, Gehel, _jensen, rosalieper, Jonas, 
FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331, Krenair
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment.


  Complete test logs attached F28854747: logs.zip 
<https://phabricator.wikimedia.org/F28854747>

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment.


  Load performance for the tested configurations on isolated environment 
(i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO)
  F28854691: Load performance.png <https://phabricator.wikimedia.org/F28854691>
  
  Query performance on simple queries (select * from {?s ?p ?o .} with ?s bound 
to random subject URI) does not show any significant changes for the tested 
journal configurations.
  Probably more complex query mix should be applied for the journals to see the 
difference.

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T213375: Inline value and reference URIs

2019-04-29 Thread Igorkim78
Igorkim78 added a comment.


  Attached results of the load 100 ttl.gz files with different configurations 
F28854613: Results.xls <https://phabricator.wikimedia.org/F28854613>
  
  - original baseline (commit blazegraph 
895a4f3bd003ddb4b1f31257f642ca3616bca79b 
<https://phabricator.wikimedia.org/rWDQR895a4f3bd003ddb4b1f31257f642ca3616bca79b>,
 rdf 4245b2a5bc0c7d4b369a43ba512b5e537dac07a4 
<https://phabricator.wikimedia.org/rWDQR4245b2a5bc0c7d4b369a43ba512b5e537dac07a4>)
  - reference URIs inlining,
  - reference URIs inlining, raw records disabled per T213210 
<https://phabricator.wikimedia.org/T213210>
  - reference URIs inlining, raw records disabled, INLINE_TEXT_LITERALS for 
short strings per T213210 <https://phabricator.wikimedia.org/T213210>
  
  Conclusions, comparing to original baseline:
  
  - Inlining of reference and value URIs takes 22% more time, produces journal 
of 10% more bytes, 1.7% less allocations but their overall size is 8.6% more, 
though the are 58% more blobs allocations with 66% more size.
  - Inlining of reference and value URIs with raw records disabled takes 20% 
more time, produces journal of the same size, 77% less allocations  with 
overall size 29% less, though the are 61% more blobs allocations with 73% more 
size.
  - Inlining of reference and value URIs and literals (less than 40chars) with 
raw records disabled takes 66% more time, produces journal of 21% more bytes, 
75% less allocations  with overall size 2% less, though the are 234% more blobs 
allocations with 382% more size.
  
  Result: 
  Configuration Option of Inlining of reference and value URIs with raw records 
disabled might be considered to reduce allocations count, but all tested 
configurations results in more allocations for BLOBs.

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Claimed] T213375: Inline value and reference URIs

2019-04-22 Thread Igorkim78
Igorkim78 claimed this task.
Igorkim78 added a comment.


  Changeset created to support reference URIs inlining:
  https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642
  
  Baseline collected for performance test:
  Data files loaded: 100 ttl gz files into an empty journal
  Total size of ttl.gz files: 7.9GB
  Number of triples in the journal: 1,114,293,494
  Size of the journal: 116,122,910,720 bytes (100GB)
  Subjects in the journal: 135,711,041
  Reference subjects in the journal:  7,206,942 (5.3% of all subjects)
  
  Load performance:
  Load time: 156278 seconds (43 hours)
  Load performance Average: 7122 mutations per second
  Load performance Stabilized (last 10 files): 4170 mutations per second
  
  Query performance measured for simple query select * {?s ?p ?o } with ?s 
bound to random subject from two sets, reference URIs and all other URIs except 
statement URIs:
  For reference URIs:
  Stabilized query performance after ~150K queries: 80 qps with average of 4 
rows per result set
  Normalized query performance 320 rows per second.
  For other URIs:
  Stabilized query performance after ~150K queries: 70 qps with average of 5 
rows per result set
  Normalized query performance 350 rows per second.
  
  In progress: reload journal with configurations:
  
  - reference URIs inlining,
  - reference URIs inlining, raw records disabled per T213210 
<https://phabricator.wikimedia.org/T213210>
  - reference URIs inlining, INLINE_TEXT_LITERALS for short strings per T213210 
<https://phabricator.wikimedia.org/T213210>
  
  and compare results with the baseline.

TASK DETAIL
  https://phabricator.wikimedia.org/T213375

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Igorkim78
Cc: Igorkim78, Aklapper, Gehel, Smalyshev, alaa_wmde, joker88john, CucyNoiD, 
Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, 
Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, 
_jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs