I actually just managed to trace down the root cause of the bug, and it's
quite surprising. It's not a heap overflow, it's a stack overflow, due to a
bug in the stdlib! Specifically, adding the same http.ClientTrace twice
onto a request context causes a stack overflow.
https://github.com/mightygua
On Mon, Jul 1, 2019 at 12:42 PM 'Yunchi Luo' via golang-nuts <
golang-nuts@googlegroups.com> wrote:
> Hello, I'd like to solicit some help with a weird GC issue we are seeing.
>
> I'm trying to debug OOM on a service we are running in k8s. The service is
> just a CRUD server hitting a database (Dy
Yeah, I've been looking at the goroutine profiles and there are some
strange stacks like the below.
1 reflect.flag.mustBeExported /go/src/reflect/value.go:213
reflect.Value.call /go/src/reflect/value.go:424
reflect.Value.Call /go/src/reflect/value.go:308
ok, this is interesting:
reflect.MakeFunc: i've never done this before. what are the allocation
patterns for creating functions with reflect? i see a few crashes
related to these functions but no mentioning of severe memory
consumption.
in my opinion, trying to capture MakeFunc patterns from your
Switching Go version seems like a stab in the dark. If the OOM symptom does
show up, you have simply wasted time. If it doesn't show up, you still don't
know if the bug exists and is simply hiding. Even if you think the bug in Go
code generation (or GC) and not in your code, there is nothing the
I removed the httptrace call yesterday and there have been no OOMs yet.
Going to let it bake for another day. If OOMs show up again, I'll try
reverting to an older Go version tomorrow. Otherwise I'll point my finger
at httptrace I guess.
On Tue, Jul 2, 2019 at 2:15 PM Yunchi Luo wrote:
> I did t
I did try to do that! I have 3 heap profiles captured from the ~3 seconds
before crash. The only thing particularly suspicious is the httptrace call
I mentioned earlier in the thread.
Diffing 1 to 2
(pprof) cum
(pprof) top 50
Showing nodes accounting for 4604.15kB, 81.69% of 5636.17kB total
Did you try running on an older release of Go, like 1.10?
> On Jul 2, 2019, at 11:53 AM, 'Yunchi Luo' via golang-nuts
> wrote:
>
> I'm not so much pointing my finger at GC as I am hoping GC logs could help
> tell the story, and that someone with a strong understanding of GC in Go
> could weig
What I have found useful in the past is pprof's ability to diff profiles.
That means that if you capture heap profiles at regular intervals you can
see a much smaller subset of changes and compare allocation patterns.
On Tue, Jul 2, 2019, 10:53 AM 'Yunchi Luo' via golang-nuts <
golang-nuts@googleg
I'm not so much pointing my finger at GC as I am hoping GC logs could help
tell the story, and that someone with a strong understanding of GC in Go
could weigh in here. In the last 4 seconds before OOM, "TotalAlloc"
increased by only 80M, yet "HeapIdle" increased to 240M from 50M, RSS
increased by
' depending on kernel version, that kernel memory used goes
> > > > > against the process for OOM purposes, so this is a likely candidate
> > > > > if pprof is showing nothing.
> > > > >
> > > > > Do you by chance do any o
; > > > server).
> > > > > > >
> > > > > > > I 'think' depending on kernel version, that kernel memory used
> > > > > > > goes against the process for OOM purposes, so this is a likely
> > > > > > > candi
and 'statm' - if my theory is
> > > > > > > > > correct you will see growth here long before the process is
> > > > > > > > > killed. Since you are running under k8s and cgroups, you will
> > > > > > > > > n
Before assuming it is the GC or something system related, you may wish to
verify it is *not your own logic*. Larger RSS could also be due to your own
logic touching more and more memory due to some runaway effect. The probability
this has to do with GC is very low given the very widespread use o
this along side the Go process (unless you
> have root access to the server).
>
> I 'think' depending on kernel version, that kernel memory used goes
> against the process for OOM purposes, so this is a likely candidate if
> pprof is showing nothing.
>
> Do you by chance
reng...@ix.netcom.com>> wrote:
>>>>>> I think don't think you are going to find it in the 'heap', rather it
>>>>>> would be in native memory.
>>>>>>
>>>>>> I would use the monitor the /proc/[pid] for the
>>> Showing top 10 nodes out of 81
>>>> flat flat% sum%cum cum%
>>>> 0 0% 0% 376842 90.92%
>>>> net/http/httptrace.(*ClientTrace).compose.func1
>>>> 0 0% 0% 376842 90.92%
inst
>> the process for OOM purposes, so this is a likely candidate if pprof is
>> showing nothing.
>>
>> Do you by chance do any of your own memory management (via malloc/CGO)? If
>> so, this is not going to show in pprof either.
>> -Original Message--
n, that kernel memory used goes
> against the process for OOM purposes, so this is a likely candidate if
> pprof is showing nothing.
>
> Do you by chance do any of your own memory management (via malloc/CGO)? If
> so, this is not going to show in pprof either.
>
> -Original
e do any of your own memory management (via malloc/CGO)?
>> If so, this is not going to show in pprof either.
>>
>> -Original Message-
>> From: 'Yunchi Luo' via golang-nuts
>> Sent: Jul 1, 2019 4:26 PM
>> To: Robert Engels
>> Cc: golang-nu
our own memory management (via malloc/CGO)? If
> so, this is not going to show in pprof either.
>
> -Original Message-
> From: 'Yunchi Luo' via golang-nuts
> Sent: Jul 1, 2019 4:26 PM
> To: Robert Engels
> Cc: golang-nuts@googlegroups.com, Alec Thomas
> Sub
't be on the heap (the reference to the TCP connection will be in the Go heap, but is probably much smaller than the buffer allocation).That would be my guess - but just a guess.-Original Message-----
From: 'Yunchi Luo' via golang-nuts
Sent: Jul 1, 2019 2:14 PM
To: golang-nuts@google
ction will be in the Go
> heap, but is probably much smaller than the buffer allocation).
>
> That would be my guess - but just a guess.
>
> -Original Message-
> From: 'Yunchi Luo' via golang-nuts
> Sent: Jul 1, 2019 2:14 PM
> To: golang-nuts@googlegroups.com
>
be my guess - but just a guess.-Original Message-
From: 'Yunchi Luo' via golang-nuts
Sent: Jul 1, 2019 2:14 PM
To: golang-nuts@googlegroups.com
Cc: Alec Thomas
Subject: [go-nuts] OOM occurring with a small heap
Hello, I'd like to solicit some help with a weird GC issue we a
Hello, I'd like to solicit some help with a weird GC issue we are seeing.
I'm trying to debug OOM on a service we are running in k8s. The service is
just a CRUD server hitting a database (DynamoDB). Each replica serves about
300 qps of traffic. There are no memory leaks. On occasion (seemingly
cor
25 matches
Mail list logo