[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-27 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-06-27T20:02:06Z] applied fix for T197447 to eqiad wdqs cluster, which involved restart of the servicesTASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/email

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-25 Thread Smalyshev
Smalyshev added a comment. Applied the temp fix for wdqs2001 and wdqs2002. Seemt to be working. I'll let them to run for a bit with it, if I don't see anything weird, I'll apply it to the rest of the servers.TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wi

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-25 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-06-26T05:28:57Z] testing fix for T197447 on wdqs1009TASK DETAILhttps://phabricator.wikimedia.org/T197447EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, StashbotCc: Stashbot, M

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-22 Thread Smalyshev
Smalyshev added a comment. Values affected: 4698 056X‏ 3227 156X‏ 5154 1895‏ 5328 9611‏ 7896 3086‏ 0003 6772 0443‏ 5661 6438‏ 8043 5485‏ 0003 7884 5356‏ 0003 9447 4903‏TASK DETAILhttps://phabricator.wikimedia.org/

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-19 Thread Smalyshev
Smalyshev added a comment. Looks like setting option -Dcom.bigdata.btree.keys.KeyBuilder.collator.strength=Identical fixes the issue, but this requires full reindex and almost doubles the size of the keys for strings, which may have impact on space consumed. I'll see if there's a way to fix the imm

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-19 Thread Smalyshev
Smalyshev added a comment. Test case: insert data INSERT { " 4698 056X\u200F" . " 4698 056X" . } WHERE {} Then query: SELECT * WHERE { ?x ?y " 4698 056X" } It should only produce one result, but it produces two now.TASK DETAILhttps://phabricator.wikimedia.

[Wikidata-bugs] [Maniphest] [Commented On] T197447: Default Blazegraph configuration confuses strings with and without RTL mark

2018-06-15 Thread Smalyshev
Smalyshev added a comment. The reason seems to be that Blazegraph is using ICU collation keys, and ICU collator seems to ignore U-200F by default. We may need to do a patch to change that. Relevant code is in: https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata/src/java/com/bigd