[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2018-03-12 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2018-03-12T15:51:30Z] restart blazegraph on wdqs2001 to validate new config - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Stashb

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2018-03-12 Thread gerritbot
gerritbot added a comment. Change 388026 merged by Gehel: [operations/puppet@production] wdqs: cleanup JVM options for blazegraph https://gerrit.wikimedia.org/r/388026TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferen

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2018-01-22 Thread gerritbot
gerritbot added a comment. Change 405723 merged by jenkins-bot: [wikidata/query/rdf@master] Align GC parameters with puppet. https://gerrit.wikimedia.org/r/405723TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/T

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2018-01-22 Thread gerritbot
gerritbot added a comment. Change 405723 had a related patch set uploaded (by Gehel; owner: Gehel): [wikidata/query/rdf@master] Align GC parameters with puppet. https://gerrit.wikimedia.org/r/405723TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.o

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-12-04 Thread gerritbot
gerritbot added a comment. Change 388026 had a related patch set uploaded (by Smalyshev; owner: Gehel): [operations/puppet@production] wdqs: cleanup JVM options for blazegraph https://gerrit.wikimedia.org/r/388026TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabrica

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-12-04 Thread Gehel
Gehel added a comment. I am mostly happy with the current GC options. It would make sense to move those back from puppet to wdqs code base, so that they can be reused by other deployment of wdqs.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/s

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-02 Thread gerritbot
gerritbot added a comment. Change 388023 merged by jenkins-bot: [wikidata/query/rdf@master] Move GC log options from puppet to the standard run script. https://gerrit.wikimedia.org/r/388023TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settin

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-02 Thread gerritbot
gerritbot added a comment. Change 388023 had a related patch set uploaded (by Gehel; owner: Gehel): [wikidata/query/rdf@master] Move GC log options from puppet to the standard run script. https://gerrit.wikimedia.org/r/388023TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttp

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-02 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-11-02T10:38:39Z] rolling restart of wdqs nodes for GC tuning - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-02 Thread gerritbot
gerritbot added a comment. Change 388017 merged by Gehel: [operations/puppet@production] wdqs: set G1 new size to 20% https://gerrit.wikimedia.org/r/388017TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerr

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-02 Thread gerritbot
gerritbot added a comment. Change 388017 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: set G1 new size to 20% https://gerrit.wikimedia.org/r/388017TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-11-01 Thread gerritbot
gerritbot added a comment. Change 384663 abandoned by Gehel: wdqs: cleanup JVM options https://gerrit.wikimedia.org/r/384663TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: Stashbot, gerritbot, S

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-30 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-30T08:32:23Z] rolling restart of wdqs for config reload - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashb

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-30 Thread gerritbot
gerritbot added a comment. Change 386791 merged by Gehel: [operations/puppet@production] wdqs: remove PrintPLAB from GC logging https://gerrit.wikimedia.org/r/386791TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreference

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-27 Thread gerritbot
gerritbot added a comment. Change 386791 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: remove PrintPLAB from GC logging https://gerrit.wikimedia.org/r/386791TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wi

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-25 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-25T09:07:04Z] rolling restart of all wdqs nodes for GC config change - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Stash

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-25 Thread gerritbot
gerritbot added a comment. Change 386132 merged by Gehel: [operations/puppet@production] wdqs: add timestamp to GC logs https://gerrit.wikimedia.org/r/386132TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ge

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread gerritbot
gerritbot added a comment. Change 386172 merged by jenkins-bot: [wikidata/query/rdf@master] Remove minimal heap size. https://gerrit.wikimedia.org/r/386172TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerr

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-24T15:09:20Z] restarting blazegraph on all wdqs nodes for GC config change - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-24T14:24:30Z] restarting blazegraph on wdqs2001 for GC config change - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Stash

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread gerritbot
gerritbot added a comment. Change 386170 merged by Gehel: [operations/puppet@production] wdqs: GC tuning https://gerrit.wikimedia.org/r/386170TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: Stas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread gerritbot
gerritbot added a comment. Change 386172 had a related patch set uploaded (by Gehel; owner: Gehel): [wikidata/query/rdf@master] Remove minimal heap size. https://gerrit.wikimedia.org/r/386172TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread gerritbot
gerritbot added a comment. Change 386170 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: GC tuning https://gerrit.wikimedia.org/r/386170TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-24 Thread gerritbot
gerritbot added a comment. Change 386132 had a related patch set uploaded (by Gehel; owner: Guillaume Lederrey): [operations/puppet@production] wdqs: add timestamp to GC logs https://gerrit.wikimedia.org/r/386132TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricat

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-23 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-23T13:19:16Z] rolling restart of wdqs for GC tuning - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot,

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-23 Thread gerritbot
gerritbot added a comment. Change 385364 merged by Gehel: [operations/puppet@production] wdqs: garbage collection tuning https://gerrit.wikimedia.org/r/385364TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: g

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-20 Thread gerritbot
gerritbot added a comment. Change 385364 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: garbage collection tuning https://gerrit.wikimedia.org/r/385364TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-17 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-17T11:12:00Z] restarting wdqs-updater on wdqs1004 - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot, ge

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-17 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-17T08:45:55Z] restarting blazegraph on wdqs1004 for GC tuning (adding -XX:+G1PrintRegionLivenessInfo) - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-17 Thread gerritbot
gerritbot added a comment. Change 384663 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: cleanup JVM options https://gerrit.wikimedia.org/r/384663TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/s

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-16 Thread gerritbot
gerritbot added a comment. Change 380972 abandoned by Gehel: wdqs: reduce blazegraph heap size to 10GB Reason: Seems that increasing our heap is a better solution than decreasing https://gerrit.wikimedia.org/r/380972TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phab

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-16 Thread gerritbot
gerritbot added a comment. Change 384557 merged by Gehel: [operations/puppet@production] wdqs: increase heap size to 16GB https://gerrit.wikimedia.org/r/384557TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-16 Thread gerritbot
gerritbot added a comment. Change 384557 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: increase heap size to 16GB https://gerrit.wikimedia.org/r/384557TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-11 Thread Gehel
Gehel added a comment. According to Kirk Pepperdine, we might have run into a G1 bug... I'll try to help the JVM guys debug it, and we might get a long term solution at some point...TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-10 Thread Smalyshev
Smalyshev added a comment. I wonder if we could correlate peak allocation rates with certain queries (which would probably be long) since we have all queries now in logstash.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpr

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread Gehel
Gehel added a comment. Looking at Grafana, we can already see a decrease in overall GC time. Looking good!TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot, gerritbot, Smalyshev, Gehel, Aklapp

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread Gehel
Gehel added a comment. New GC configuration deployed, all servers restarted. I'll wait a few hours and I'll have a look at GC logs to find out if we see improvements. For reference, the JVM options before: -XX:+UseG1GC -Xms12g -Xmx12g -Xloggc:/var/log/wdqs/wdqs-blazegraph_jvm_gc.%p.log -XX:+Print

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread gerritbot
gerritbot added a comment. Change 382401 merged by Gehel: [operations/puppet@production] wdqs: send GC logs to file https://gerrit.wikimedia.org/r/382401TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerrit

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread gerritbot
gerritbot added a comment. Change 382401 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: send GC lgos to file https://gerrit.wikimedia.org/r/382401TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread gerritbot
gerritbot added a comment. Change 382397 merged by Gehel: [operations/puppet@production] wdqs: fix typo in GC_LOGS https://gerrit.wikimedia.org/r/382397TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritb

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread gerritbot
gerritbot added a comment. Change 382397 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: fix typo in GC_LOGS https://gerrit.wikimedia.org/r/382397TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/s

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-05T08:53:18Z] rolling restart of wdqs to pick up new GC options - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread gerritbot
gerritbot added a comment. Change 382195 merged by Gehel: [operations/puppet@production] wdqs: GC tuning https://gerrit.wikimedia.org/r/382195TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: Stas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Smalyshev
Smalyshev added a comment. The -XX:MaxTenuringThreshold flag may have been set too low. I'd try to see if this one helps, because BG work pattern would be allocating tons of stuff when query is processed, and then releasing it after it's done. But the query can be processed for a whole minute, so

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread gerritbot
gerritbot added a comment. Change 382195 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: GC tuning https://gerrit.wikimedia.org/r/382195TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. Transparent huge pages seem to be disabled: gehel@wdqs1004:~$ cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never Patch coming up to test other options.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. A thread on Friends of JClarity suggests: adding -XX:+ParallelRefProcEnabled (since we seem to have really high Reference processing times turn off PrintTenuringThreshold, since it is rarely useful with G1. check transparent huge pages are disabled activate PrintAdaptiveSi

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. Using a demo version of Jclarity Censum, I see the following: Problems Premature promotion: There are a number of possible causes for this problem: Survivor spaces are too small. Young gen is too small. The -XX:MaxTenuringThreshold flag may have been set too low. The

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. for reference: JVM: gehel@wdqs1004:~$ java -version openjdk version "1.8.0_141" OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~bpo8+1-b15) OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode) JVM options (full command line): java -server -XX:+UseG1GC -Xms12g

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. See F9995047: wdqs gc logs for an example of problematic GC log.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot, gerritbot, Smalyshev, Gehel, Aklapper, Gq86, Lordiis,

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-03 Thread Gehel
Gehel added a comment. It would be interesting to see if allocation rate goes up when we see the JVM locking up. gceasy does not graph allocation rate over time, but I think that JClarity Censum might do that (closed source, but demo license available).TASK DETAILhttps://phabricator.wikimedia.org/T

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-03 Thread Gehel
Gehel added a comment. Comments inline. Keep in mind that my understanding of GC is limited, I am most probably wrong in a lot of what I write below. And my understanding of G1 is even more limited... In T175919#3647588, @Smalyshev wrote: So I took a look at our logs from Sep 29 with http://gceas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-29 Thread Smalyshev
Smalyshev added a comment. So I took a look at our logs from Sep 29 with http://gceasy.io/ and I got the following conclusions (could be totally misguided, please tell me if so): The log file covers about 5 to 6 hours. I think this should be enough but maybe if there are daily patterns it could b

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-29 Thread Gehel
Gehel added a comment. There are multiple issues which can come from oversized heap. In particular: If heap is available, it will be used, garbage collection will just start later. Graph traversal in not linear (it is O(n) to O(n^2), depending on how cyclic the graphs is), traversing smaller grap

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-28 Thread Smalyshev
Smalyshev added a comment. No I don't think 3G is nearly enough. We had 8G and I had to bump it because some queries killed the server. Also, in my experience, one of the failure modes specifically on low memory is not clean OOME but rather progressive performance degradation where the memory does

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-28 Thread Gehel
Gehel added a comment. Yes, we should be careful, but at the moment the logs indicate that a heap size of 3GB should be OK (looking at heap after GC). That's an oversizing by a factor of 3. Worst case, we have -XX:+ExitOnOutOfMemoryError configured, so we should exit and restart cleanly in most cas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-27 Thread Gehel
Gehel added a comment. Doing some quick math: wdqs eqiad cluster has between 20 and 40 requests per seconds according to our varnish stats 3 servers in that cluster let's assume requests are routed equally to all servers, that's 10 req/second/server 1.3 GB/sec allocation rate 130MB allocated per

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-27 Thread Smalyshev
Smalyshev added a comment. I think we should be careful about cutting down heap size. The peaks is what I am worried about, and Java has a nasty habit of getting stuck once the heap is exhausted. I'd rather it not happen in the middle of the night because somebody run some heavy query...TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-27 Thread Gehel
Gehel added a comment. For comparison, wdqs2001 (which is seeing much les user traffic, but the same amount of writes) is showing a GC overhead of ~3.2%, allocation rate of 200mb/sec, max GC time of 1.7 seconds. Obvious solution to improve throughput: stop sending user traffic to our servers :) M

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-27 Thread gerritbot
gerritbot added a comment. Change 380972 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: reduce blazegraph heap size to 10GB https://gerrit.wikimedia.org/r/380972TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-27 Thread Gehel
Gehel added a comment. Looking at 21 hours of GC logs on wdqs1004, I still see 1.3GB/sec allocation rate (almost 100TB allocated over 21 hours!). GC overhead is still high at ~8%. It still looks like the heap is oversized, so I'll try to reduce more to see if that increases throughput (but I doubt

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread Gehel
Gehel added a comment. comparing 6h of GC logs from before the reduction in heap size with 2h40m of GC logs after the reduction, here are a few observations: GC overhead increased from 6.74% to 8.12% allocation rate increased from ~900mb/sec to 1.2gb/sec (allocation rate is the rate at which the

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread gerritbot
gerritbot added a comment. Change 380768 merged by Gehel: [operations/puppet@production] wdqs: reduce blazegraph heap size to 12GB https://gerrit.wikimedia.org/r/380768TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailprefere

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread gerritbot
gerritbot added a comment. Change 380768 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: reduce blazegraph heap size to 12GB https://gerrit.wikimedia.org/r/380768TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-09-26T15:06:23Z] restarting blazegraph on all wdqs nodes for heap resize - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Stas

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-09-26T12:44:39Z] restarting blazegraph on wdqs1004 / wdqs2001 for heap resive - T175919TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-26 Thread gerritbot
gerritbot added a comment. Change 378227 merged by Gehel: [operations/puppet@production] wdqs: reduce heap size to 12GB https://gerrit.wikimedia.org/r/378227TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ge

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-15 Thread Smalyshev
Smalyshev added a comment. The heap was bumped from 8G because there were some OOMs with heavy queries (some of them still use a bit of heap even if most of the data uses Blazegraph's own allocator). So let's not be over-zealous in reducing it yet. 12G could still be fine.TASK DETAILhttps://phabric

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-15 Thread Gehel
Gehel added a comment. Looking at wdqs-updater GC logs on wdqs1004, in the last 7 days: heap before GC peaks at ~1.4GB (with a few higher peaks at ~2GB) heap after full GC is ~512MB max heap size is configured at 2GB allocation rate over that period is ~70MB/s (but probably peaks much higher, the

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-15 Thread gerritbot
gerritbot added a comment. Change 378227 had a related patch set uploaded (by Gehel; owner: Gehel): [operations/puppet@production] wdqs: reduce heap size to 12GB https://gerrit.wikimedia.org/r/378227TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.