[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 Gabriel Wicke gwi...@wikimedia.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Gabriel Wicke gwi...@wikimedia.org --- node-mysql was not updated properly. Mea culpa. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #7 from Marc Ordinas i Llopis marc...@wikimedia.org --- Just a note on why the code seems different to https://github.com/felixge/node-mysql#transactions : The transactions support in node-mysql is very recent and (afaik) hasn't been released to npm yet. We're using node-mysql-queues for transactions, at https://github.com/bminer/node-mysql-queues . -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #8 from Marc Ordinas i Llopis marc...@wikimedia.org --- After following retained memory in gwicke's snapshot, it seems the biggest culprit is the _callSite member of Query. Lo and behold, then I found this commit for node-mysql: https://github.com/felixge/node-mysql/commit/fd984a71c715a9ffe54b9154d406a85e3711940f Can we check that the server is running the latest node-mysql version, 2.0.0-alpha9? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #9 from Gabriel Wicke gwi...@wikimedia.org --- It was at 2.0.0-alpha8. It looks like upgrading to 2.0.0-alpha9 has helped. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #10 from Gabriel Wicke gwi...@wikimedia.org --- Looks better, but: [14:24] gwicke mysql on the rt server does not look too happy [14:24] gwicke the node portion of it is now at 380M [14:24] gwicke probably a lot of waiting / conflicting transactions Lets see how this develops. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #5 from Marc Ordinas i Llopis marc...@wikimedia.org --- I did detect some amount of memory leakage while profiling locally, if I recall correctly, ~90k per fetch/results cycle. As I mentioned in a previous comment, the leakage seemed to come from not-released arrays of code deoptimization_data. If I remember correctly V8's arch that could come from virtual method calls over many different implementations, debugging/profiling artifacts, or a few other things. And of course it could be our code or some library leaking closures, but I think those would show up differently, especially if the code is used only once. That's where profiling took me… I'm a little stumped of what to try next, but I think that I'll make a small test case to check it's not the mysql library. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #6 from Gabriel Wicke gwi...@wikimedia.org --- I took a heap snapshot in production and inspected it in chromium. It seems that a large number of closures are responsible for the memory leak. From browsing the data a bit it seems likely that the claimPage closure is responsible for much of the leaking, which is not surprising as it is called so often. I also see a large number of Error instances in the heap, which might indicate that an error code path (transaction rollback?) is responsible for the leak. The use of the mysql bindings seems to differ a bit from the documentation at https://github.com/felixge/node-mysql#transactions. Diffs I see are the execute() and the end and calling rollback on the return value of startTransaction rather than db. There are some other structures that look a bit cyclical like the transUpdateCB binding. Maybe some of this can be converted to use async (https://github.com/felixge/node-mysql#transactions)? -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #3 from ssas...@wikimedia.org --- Gabriel: are you sure this is the rt server? Here is latest top from which I extracted nodejs/node info. This seems like the parsoid web service clogged up. The RT server command is 'node' and parsoid web service is 'nodejs' below. - PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ GROUPCOMMAND -- 10527 nobody20 0 3555m 2.6g 3208 R 83 16.3 999:13.13 nogroup nodejs 5192 nobody20 0 2554m 1.6g 3288 S0 10.3 667:38.03 nogroup nodejs 29942 nobody20 0 2238m 1.3g 2408 R 60 8.6 830:00.08 nogroup nodejs 13788 nobody20 0 2149m 1.3g 3176 R 74 8.0 985:14.12 nogroup nodejs 29857 nobody20 0 1988m 1.1g 1880 R 76 7.1 2700:26 nogroup nodejs 3800 nobody20 0 1945m 1.1g 2448 R 72 6.8 741:21.81 nogroup nodejs 5973 nobody20 0 1905m 1.0g 1984 R 92 6.5 1201:01 nogroup nodejs 348 nobody20 0 1649m 795m 3324 R 87 5.0 309:30.79 nogroup nodejs 30270 nobody20 0 1580m 714m 3280 R 78 4.5 350:47.62 nogroup nodejs 12581 nobody20 0 1454m 603m 5408 R 86 3.8 86:18.80 nogroup nodejs 11284 nobody20 0 1372m 522m 5400 R 84 3.3 138:46.81 nogroup nodejs -- 19373 mysql 20 0 1313m 297m 2292 S6 1.9 5620:09 mysqlmysqld -- 22494 nobody20 0 858m 190m 2540 S0 1.2 104:52.24 nogroup node 10390 nobody20 0 662m 28m 1848 S0 0.2 0:03.79 nogroup node -- I tried restarting and the following processes didn't restart (I'll kill -9 them). So, I suspect it is not the RT server. 29942 nobody20 0 2224m 1.3g 2408 R 102 8.5 834:14.19 nodejs 13788 nobody20 0 2166m 1.3g 3176 R 99 8.1 989:05.14 nodejs 29857 nobody20 0 2022m 1.1g 1880 R 101 7.3 2704:39 nodejs 3800 nobody20 0 1955m 1.1g 2448 R 100 6.8 745:32.87 nodejs 5973 nobody20 0 1915m 1.0g 1984 R 101 6.6 1205:07 nodejs 11284 nobody20 0 1383m 535m 5400 R 102 3.3 142:59.75 nodejs -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #4 from ssas...@wikimedia.org --- I see it now as well ... 1.5G so far. t So, there are probably different sources -- some for workers, some for the RT server. Hmm ... I got confused because when I looked at top at that time, the test run had completed. So, I suppose it is either the title fetch or the save-results endpoint that is probably leaking memory (need not be our code necessarily, could also be something in the libraries). -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 --- Comment #2 from Gabriel Wicke gwi...@wikimedia.org --- Just happened again. Had to kill -9 it, normal restart did not help. top showed 1.4G rss. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 Marc Ordinas i Llopis marc...@wikimedia.org changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #1 from Marc Ordinas i Llopis marc...@wikimedia.org --- After some rounds of heap profiling, the major increases in size are in compiled code and in arrays of code deoptimization_data. Will keep on profiling. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 53776] The round-trip server seems to leak memory in production
https://bugzilla.wikimedia.org/show_bug.cgi?id=53776 Gabriel Wicke gwi...@wikimedia.org changed: What|Removed |Added Priority|Unprioritized |High -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l