Public bug reported:

== Comment: #0 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-01 23:09:10 ==
Team has changed to the Bare-metal Ubuntu 16.4.  The problem still exists, so 
it is not related to the virtualization. 

Since the bug is complicated to reproduce,  Could we use sets of tools
to collect the data when this happens?


---Problem Description---
MongoDB has memory corruption issues which only occurred on Ubuntu 16.04, it 
doesn't occur on Ubuntu 15.
 
Contact Information =Calvin Sze/Austin/IBM
 
---uname output---
Linux master 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:57 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = Model:                 2.1 (pvr 004b 0201) Model name:           
 POWER8E (raw), altivec supported 
 
---System Hang---
 the system is still alive
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 Unfortunately, not very easily. I had a test case that I was running on 
ubuntu1604-ppc-dev.pic.build.10gen.cc and xxxx-ppc-dev.pic.build.10gen.cc. I 
understand these to be two VMs running on the same physical host.

About 3.5% of the test runs on ubuntu1604-ppc-dev.pic.build.10gen.cc
would fail, but all of the runs on the other machine passed. Originally,
this failure manifested as the GCC stack protector (from -fstack-
protector-strong) claiming stack corruption.

Hoping to be able to see the data that was being written and corrupting
the stack, I manually injected a guard region into the stack of the
failing functions as follows:


+namespace {
+
+class Canary {
+public:
+
+    static constexpr size_t kSize = 1024;
+
+    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+        ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
+    }
+
+    ~Canary() {
+        _verify();
+    }
+
+private:
+    static constexpr uint8_t kBits = 0xCD;
+    static constexpr size_t kChecksum = kSize * size_t(kBits);
+
+    void _verify() const noexcept {
+        invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
+    }
+
+    const volatile unsigned char* const _t;
+};
+
+}  // namespace
+

 Status bsonExtractField(const BSONObj& object, StringData fieldName, 
BSONElement* outElement) {
+
+    volatile unsigned char* const cookie = static_cast<unsigned char 
*>(alloca(Canary::kSize));
+    const Canary c(cookie);
+ 

When running with this, the invariant would sometimes fire. Examining
the stack cookie under the debugger would show two consecutive bytes,
always at an offset ending 0x...e, written as either 0 0, or 0 1,
somewhere at random within the middle of the cookie.

This indicated that it was not a conventional stack smash, where we were
writing past the end of a contiguous buffer. Instead it appeared that
either the currently running thread had reached up some arbitrary and
random amount on the stack and done either two one-byte writes, or an
unaligned 2-byte write. Another possibility was that a local variable
had been transferred to another thread, which had written to it.

However, while looking at the code to find such a thing, I realized that
there was another possibility, which was that the bytes had never been
written correctly in the first place. I changed the stack canary
constructor to be:


+    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+        ::memset(const_cast<unsigned char*>(_t), kBits, kSize);
+        _verify();
+    }  

So that immediately after writing the byte pattern to the stack buffer,
we verified the contents we wrote. Amazingly, this *failed*, with the
same corruption as seen before. This means that either between the time
we called memset to write the bytes and when we read them back,
something either overwrote the stack cookie region, or that the bytes
were never written correctly by memset, or that memset wrote the bytes,
but the underlying physical memory never took the write.


 
Stack trace output:
 no
 
Oops output:
 no
 
Userspace tool common name: MongoDB 

Userspace rpm: mongod 
 
The userspace tool has the following bit modes: 64bit 
 
System Dump Info:
  The system is not configured to capture a system dump.

Userspace tool obtained from project website:  na 
 
*Additional Instructions for Lilian Romero/Austin/IBM: 
-Post a private note with access information to the machine that the bug is 
occuring on. 
-Attach sysctl -a output output to the bug.
-Attach ltrace and strace of userspace application.

== Comment: #1 - Luciano Chavez <cha...@us.ibm.com> - 2016-11-02 08:41:47 ==
Normally for userspace memory corruption type problems I would recommend 
Valgrind's memcheck tool though if this works on other versions of linux, one 
would want to compare the differences such as whether or not  you are using the 
same version of mongodb, gcc, glibc and the kernel. 

Has a standalone testcase been produced that shows the issue without
mongodb?

== Comment: #2 - Steven J. Munroe <sjmun...@us.ibm.com> - 2016-11-02 10:27:40 ==
We really need that standalone test case.

Need to look at WHAT c++ is doing with memset. I suspect the compiler is
short circuiting the function and inlining. That is what you would want
for optimization, but we need to know so we can steer this to the
correct team.

== Comment: #3 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-02 13:17:30 ==
Hi Luciano and Steve, Thanks for the advise,

They don't have a standalone test case without Mongodb,  I could image
it take a while and probably not that easy to produce.  I am seeking
your advise how to approach this.  The failure takes at least 24 - 48
hours running to reproduce.  Steve, do you have what you needed for C++
test,  or there is something I need to ask Mongo development team?

Thanks

== Comment: #4 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-02 16:29:26 
==
(In reply to comment #3)
> Hi Luciano and Steve, Thanks for the advise,
> 
> They don't have a standalone test case without Mongodb,  I could image it
> take a while and probably not that easy to produce.  I am seeking your
> advise how to approach this.  The failure takes at least 24 - 48 hours
> running to reproduce.  Steve, do you have what you needed for C++ test,  or
> there is something I need to ask Mongo development team?
> 
> Thanks

It's unclear to me yet that we have evidence of this being a problem in
the toolchain.  Does the last experiment (revised Canary constructor)
ALWAYS fail, or does it also fail only ever 24 - 48 hours?  If the
latter, then all we know is that stack corruption happens.  There's no
indication of where the wild pointer is coming from (application
problem, compiler problem, etc.).  If it does always fail, however, then
I question the assertion that they can't provide a standalone test case.

We need something more concrete to work with.

Bill

== Comment: #5 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-03 18:08:33 ==
Could this ticket be viewed by external customer/ISV?
I am thinking how to establish the direct communications between Mongodb 
development team and experts/owner of the ticket to pass the middle man, me :-)

Here are the MongoDB deelopment director, Andrew's answers to my 3
questions. And in addition he added comments.

Basically, there are 3 questions,

> 1. Is the mongoDB binary built with gcc came with Linux distributions
or with IBM Advance toolchain gcc?


We build our own GCC, but we have reproduced the issue with both our custom 
GCC, and the builtin linux distribution GCC. We have also reproduced with clang 
3.9 built from source on the Ubuntu 16.04 POWER machine, so we do not think 
that this is a compiler issue (could still be a std library issue).


> 2. Does the last experiment (revised Canary constructor) ALWAYS fail, or does 
> it also fail only ever 24 - 48 hours?

No, we have never been able to construct a deterministic repro. We are
only able to get it to fail after running the test a very large number
of times.


> 3. Is there any way we can have a standalone test case without
MongoDB?

We do not have such a repro at this time.

I do understand the position they are taking - it isn't a lot of
information to go on, and most of the time the correct response to a
mysterious software crash is to blame the software itself, not the
surrounding ecosystem. However, we have a lot of *indirect* evidence
that has made us skeptical that this is our bug. We would love to be
proved wrong!


- The stack corruption has not reproduced on any other systems. We are running 
these same tests on every commit across dozens of Linux variants, and across 
four cpu architectures (x86_64, POWER, zSeries, ARMv8).
- We don't see crashes on other POWER, but we do on Ubuntu POWER.
- We don't see crashes on Windows, Solaris, OS X
- We have run the under the clang address sanitizer, with no reports.
- We have enabled the clang address sanitizer use-after-return detector, and 
found no results.


If this were a wild pointer in the MongoDB server process that was writing to 
the stack of other threads, we would expect to see corruption show up 
elsewhere, but we simply do not. 

However, lets assume that this is a bug in our code, that for whatever
reason only reveals itself on POWER, and only on Ubuntu. We would still
be interesting in learning from the kernel team if there are additional
power specific debugging techniques that we might be able to apply. In
particular, the ability to programmatically set/unset hardware
watchpoints over the stack canary. Another possibility would be to
mprotect the stack canary, but it is not clear to us whether it is valid
to mprotect part of the stack, either in general, or on POWER.

We would be happy to hear any suggestions on how to proceed.


Thanks,
Andrew

== Comment: #6 - Steven J. Munroe <sjmun...@us.ibm.com> - 2016-11-03 18:34:30 ==
you could tell what specific GCC version you are based on and configure options.

You could provide the disassemble of the canary code.

== Comment: #7 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-03 23:01:55 
==
It would be useful to see what the Canary is compiled into, as Steve suggested. 
 Let's make sure it's doing what we think it is.

Given we have multiple compilers producing the same results, we may want
to think more about the runtime environment -- are you using the same
glibc and libstdc++ in all cases?  Clang at least would pick up the
distro versions, as it doesn't provide its own.

One reason you see this on Ubuntu 16.04 and not on another linux distro
is likely because of glibc level.  The other linux's glibc is quite old
by comparison.  glibc 2.23, which appears on Ubuntu 16.04, is the first
version to be compiled with -fstack-protector-strong by default.  So
this doesn't necessarily mean that the bug doesn't exist elsewhere; it
just means that the stack protector code isn't enabled to spot the
problem.  If the stack corruption is benign, then it wouldn't be noticed
otherwise.

I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5
that ships with the system, in case that becomes relevant.

I don't personally have a lot of experience with trying to debug
something of this nature, in case we don't see something obvious from
the disassembly of the canary.  CCing Ulrich Weigand in case he has some
ideas of other approaches to try.

== Comment: #9 - Ulrich Weigand <ulrich.weig...@de.ibm.com> - 2016-11-04 
12:21:48 ==
I don't really have any other great ideas either.   Just two comments:

- Even though the original reported mentioned they already tried clang's
address sanitizer, I'd definitely still also try reproducing the problem
under valgrind -- the two are different in what exactly they detect, and
using both tools in a complex problem can only help.

- The Canary code sample above has strictly speaking undefined behavior,
I think: it is calling memset on a const *.  (The const_cast makes the
warning go away, but doesn't actually cure the undefined behavior.)  I
don't *think* this will cause codegen changes in this example, but it
cannot hurt to try to fix this and see if anything changes.

== Comment: #12 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 10:32:25 ==
Hi Bill, Thanks

I have asked Andrew, waiting for his confirmation.

== Comment: #14 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 10:56:49 ==
Hi Calvin -


I can provide the assembly of the function that contains the canary (the canary 
itself gets inlined), but I think it might just be easier if I uploaded a 
binary and an associated corefile? That way your engineers could disassemble 
the crashing function themselves in the debugger and see exactly what the state 
was at the time of the crash.


What is the best way for me to get that information to you?


Thanks,
Andrew

== Comment: #15 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 10:58:54 ==
Provided the binary and core information.

Note from Mongo;

                 I've uploaded a sample core file and the associated binary to 
your ftp 
server as detailed above.  The binary is named `mongod.power` and the core is 
named `mongod.power.core`.

                 You should expect to see a backtrace on the faulting thread 
which looks 
like this (for the first few frames):

(gdb) bt
#0  0x00003fff997be5d0 in __libc_signal_restore_set (set=0x3fff5814c1f0)
    at ../sysdeps/unix/sysv/linux/nptl-signals.h:79
#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:55
#2  0x00003fff997c0c00 in __GI_abort () at abort.c:89
#3  0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>, 
    file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp", 
    line=<optimized out>) at src/mongo/util/assert_util.cpp:154
#4  0x00000000224bbc48 in mongo::(anonymous namespace)::Canary::_verify (
    this=<optimized out>) at src/mongo/bson/util/bson_extract.cpp:58


The "Canary::_verify" frame (number 4) has a local variable "_t" which is an 
on-the-stack array and filled with "0xcd" for a span of 1024 bytes.  Near the 
end of this block we see two bytes of poisoned memory which were altered:

0x3fff5814c858: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
0x3fff5814c860: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
0x3fff5814c868: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0x01    0x00
0x3fff5814c870: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd
0x3fff5814c878: 0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd    0xcd


Note the two bytes set to values "0x01" and "0x00".

At the time of core-dump all the other threads seemed to be paused on system 
calls such as "recv" or "__pthread_cond_wait".  The verify function is called 
when setting up our software canary, and checks the memory immediately after 
its setup.  We do not run any other functions on this thread between the 
memory poisoning and the verification of the poisoning.  All other threads 
appear to be paused at this time.

== Comment: #16 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 10:59:40 ==
A follow up message from Mongo

The function calling the canary code, which you'll want to possibly 
disassemble is in frame 6:

#6  mongo::bsonExtractStringField (object=..., fieldName=..., 
    out=0x3fff5814caa8) at src/mongo/bson/util/bson_extract.cpp:138

                 The lower numbered frames deal with the canary code
itself.

== Comment: #17 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 11:03:46 ==
>From Andrew,

>Given we have multiple compilers producing the same results, we may want to
>think more about the runtime environment -- are you using the same glibc and
>libstdc++ in all cases? Clang at least would pick up the distro versions, as
>it doesn't provide its own.

We have repro'd with three compilers:

- The system GCC, using system libstdc++ and system glibc
- Our hand-rolled GCC, using its own libstdc++, and system glibc
- One off clang-3.9 build, using system libstdc++, and system glibc.


Coincidentally, both system and hand-rolled GCC are 5.4.0, so there may not be 
as much variation there as hoped. We could try building with clang and libc++ 
to at least rule out libstdc++ as a factor.
 

>One reason you see this on Ubuntu 16.04 and not on the other linux distro is 
>likely because of
>glibc level. The other linux distro's glibc is quite old by comparison. glibc 
>2.23, which
>appears on Ubuntu 16.04, is the first version to be compiled with
>-fstack-protector-strong by default.

I'm not sure I follow. Our software has been built with 
-fstack-protector-strong on both platforms, whether or not glibc has been, and 
the invocation of the __stack_chk_fail function is always from our code, not 
from glibc, or libstdc++. So, I'd expect that if there were stack corruption 
taking place as a result of our code, we would see the stack protector trip on 
both platforms. Or are you saying that on platforms where glibc itself wasn't 
built with -fstack-protector-whatever that user code built with that same flag 
won't report errors?
 
>So this doesn't necessarily mean that the
>bug doesn't exist elsewhere; it just means that the stack protector code isn't
>enabled to spot the problem. If the stack corruption is benign, then it
>wouldn't be noticed otherwise.

Yeah, still confused. I can definitely make the other linux distro box
report a stack corruption:


[amor...@xxxx-ppc-dev.pic.build ~]$ cat > boom.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


struct no_chars {
    unsigned int len;
    unsigned int data;
};


int main(int argc, char * argv[])
{
    struct no_chars info = { };


    if (argc < 3) {
        fprintf(stderr, "Usage: %s LENGTH DATA...\n", argv[0]);
        return 1;
    }


    info.len = atoi(argv[1]);
    memcpy(&info.data, argv[2], info.len);


    return 0;
}
[amor...@rhel71-ppc-dev.pic.build ~]$ gcc -Wall -O2 -U_FORTIFY_SOURCE 
-fstack-protector-strong boom.c -o boom


[amor...@rhel71-ppc-dev.pic.build ~]$ ./boom 64 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
*** stack smashing detected ***: ./boom terminated
Segmentation fault


I assume that glibc 2.23 was compiled with Ubuntu's version of gcc 5 that ships
with the system, in case that becomes relevant.


Correct, we have not made any changes to glibc - we are using the stock version 
that ships on the system.

== Comment: #18 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 11:04:24 ==
>From Andrew

Also, I want to re-iterate that while we have definitely observed cases
where the stack protector detects the stack corruption, we have also
observed stack corruption within our own hand-rolled stack buffer, per
the code posted earlier. The core dump that Adam provided is of this
latter sort So to some extent, this is independent of -fstack-protector-
strong.


One thing that I have not yet ruled out is whether -fstack-protect-strong could 
itself be at fault, somehow, though I find that unlikely given that we have 
reproduced with clang as well.


Still, it sounds like a worthwhile experiment, so I will see if I can still 
detect corruption in our hand-rolled stack canary when building without any 
form of -fstack-protector enabled.

== Comment: #19 - Calvin L. Sze <calv...@us.ibm.com> - 2016-11-06 11:05:58 ==
>From Andrew,


I've performed this experiment, replacing our use of -fstack-protector-strong 
with -fno-stack-protector when building MongoDB, and I can confirm that we 
still observe stack corruption in our hand-rolled canary, per the code posted 
earlier.


I have a core file and executable. Let me know if you would be interested in my 
providing those in addition to the files provided yesterday by Adam.

== Comment: #21 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-07 
11:10:54 ==
Andrew, thanks for all the details, and for the binary and core file!  I'll 
start poking through them this morning.  I've just been absorbing all the notes 
that Calvin dumped into our bug tracking system yesterday.

You can ignore what I was saying about -fstack-protector-strong.  My
thought at the time was that *if* the flow of control entered glibc,
that whether or not the code *there* was compiled with -fstack-
protector-strong might prove to make a difference.  Reading back through
today, I see that was off base, so sorry for the distraction.

While I'm looking at the binary, there are a couple of other things you might 
want to try:
 - Replace ::memset with __builtin_memset with GCC to see whether that makes 
any difference;
 - Try Ulrich Weigand's suggestions from comment #9;
 - As you suggested, try clang + libc++ to try to rule libstdc++ in or out.

A couple of questions that may or may not prove relevant:  
 - You've mentioned you don't get the crashes on the other linux distro.  Have 
you tried your modified canary on the other linux distro anyway?  If we're 
certain the two systems behave differently with the canary that may help us in 
narrowing things down.
 - Which version of the C++ standard are you compiling against?  Is it just the 
default on all systems, or are you forcing a specific -std=...?

== Comment: #22 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-07 
12:18:41 ==
I'm having some difficulties with core file compatibility.  I put your files on 
an Ubuntu 16.04.1 system, but I don't see quite the same results as you report 
under gdb, with libc and libgcc shared libs not at the correct address and a 
problem with the stack.  There's a transcript below.  I'm particularly 
concerned about the warning that the core file and executable may not match.  
Note also the report of stack corruption above frame #4, so I can't get to 
frame #6 to look at the register state.  The library frames at #0-#3 are 
reporting the wrong information, which I assume to be because the libraries are 
at the wrong address.

For debug purposes it would probably be best to use the system compiler,
just in case that wasn't the case here.

$ ls -l
total 1950688
-rw-r--r-- 1 wschmidt wschmidt  700141992 Nov  7 14:37 mongod.power
-rw-r--r-- 1 wschmidt wschmidt 1297350656 Nov  7 14:39 mongod.power.core
$ gdb mongod.power mongod.power.core
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mongod.power...done.

warning: core file may not match specified executable file.
[New LWP 101461]
[New LWP 100045]
[New LWP 100062]
[New LWP 100056]
[New LWP 99983]
[New LWP 100052]
[New LWP 100054]
[New LWP 99892]
[New LWP 100051]
[New LWP 100048]
[New LWP 100007]
[New LWP 99868]
[New LWP 100059]
[New LWP 101459]
[New LWP 100001]
[New LWP 99986]
[New LWP 101403]
[New LWP 99980]
[New LWP 99882]
[New LWP 99893]
[New LWP 99877]
[New LWP 99872]
[New LWP 101462]
[New LWP 99874]
[New LWP 100058]
[New LWP 100231]
[New LWP 99994]
[New LWP 99873]
[New LWP 100003]
[New LWP 99993]
[New LWP 99879]
[New LWP 101398]
[New LWP 99891]
[New LWP 99880]
[New LWP 99910]
[New LWP 99895]
[New LWP 99901]
[New LWP 100011]
[New LWP 99974]
[New LWP 100049]
[New LWP 99898]
[New LWP 99875]
[New LWP 101460]
[New LWP 99878]
[New LWP 99871]
[New LWP 99896]
[New LWP 101954]
[New LWP 101406]
[New LWP 100015]
[New LWP 100068]
[New LWP 99984]
[New LWP 101519]
[New LWP 100053]
[New LWP 99996]
[New LWP 100050]
[New LWP 100055]
[New LWP 100057]
[New LWP 101807]
[New LWP 99890]
[New LWP 100004]
[New LWP 99884]
[New LWP 101437]
[New LWP 101455]
[New LWP 100013]
[New LWP 99894]
[New LWP 101411]
[New LWP 101457]
[New LWP 101431]
[New LWP 101458]
[New LWP 100443]
[New LWP 101438]
[New LWP 101414]
[New LWP 101433]
[New LWP 101784]
[New LWP 99979]
[New LWP 101397]
[New LWP 101402]
[New LWP 101401]
[New LWP 101435]
[New LWP 101405]
[New LWP 101423]
[New LWP 101425]
[New LWP 99897]
[New LWP 101419]
[New LWP 99989]
[New LWP 101409]
[New LWP 100008]
[New LWP 101410]
[New LWP 99998]
[New LWP 101413]
[New LWP 101469]
[New LWP 101418]
[New LWP 101427]
[New LWP 101399]
[New LWP 101235]
[New LWP 101396]
[New LWP 101421]
[New LWP 99990]
[New LWP 101407]
[New LWP 101480]
[New LWP 100060]
[New LWP 101499]
[New LWP 101506]
[New LWP 101395]
[New LWP 101415]
[New LWP 101400]
[New LWP 101412]
[New LWP 101408]
[New LWP 101420]
[New LWP 101416]
[New LWP 101492]
[New LWP 101513]
[New LWP 101782]
[New LWP 101404]
[New LWP 101481]
[New LWP 101417]
[New LWP 100067]
[New LWP 101429]
[New LWP 99883]
[New LWP 101430]
[New LWP 101436]
[New LWP 101454]
[New LWP 101428]
[New LWP 101422]
[New LWP 100108]
[New LWP 101434]
[New LWP 100064]
[New LWP 101453]
[New LWP 100061]
[New LWP 101426]
[New LWP 100066]
[New LWP 101452]
[New LWP 101439]
[New LWP 101456]
[New LWP 101451]
[New LWP 101450]
[New LWP 101432]
[New LWP 101449]
[New LWP 101424]
[New LWP 100065]
[New LWP 100063]
[New LWP 101448]
[New LWP 101447]
[New LWP 101446]
[New LWP 101445]
[New LWP 101444]
[New LWP 101443]
[New LWP 101442]
[New LWP 101441]
[New LWP 101440]

warning: .dynamic section for "/lib/powerpc64le-linux-gnu/libgcc_s.so.1"
is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib/powerpc64le-linux-gnu/libc.so.6" is not at 
the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `/home/pic1user/proj/mongo-repro/mongod --oplogSize 1024 
--port 30012 --nopreall'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
    at ../sysdeps/generic/math_private.h:233
233     ../sysdeps/generic/math_private.h: No such file or directory.
[Current thread is 1 (Thread 0x3fff5814ec20 (LWP 101461))]
(gdb) bt
#0  0x00003fff997be5d0 in __copysign (y=<optimized out>, x=<optimized out>)
    at ../sysdeps/generic/math_private.h:233
#1  __modf_power5plus (x=-6.2774385622041925e+66, iptr=0x3fff5814c1f0)
    at ../sysdeps/powerpc/power5+/fpu/s_modf.c:44
#2  0x00003fff997be4f0 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#3  0x00003fff997c0c00 in ?? () at ../signal/allocrtsig.c:45
   from /lib/powerpc64le-linux-gnu/libc.so.6
#4  0x00000000223c33e8 in mongo::invariantFailed (expr=<optimized out>, 
    file=0x24131b38 "src/mongo/bson/util/bson_extract.cpp", 
    line=<optimized out>) at src/mongo/util/assert_util.cpp:154
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/powerpc64le-linux-gnu/5/lto-wrapper
Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/IBM 
5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs 
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-5 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-libquadmath --enable-plugin --with-system-zlib 
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el/jre --enable-java-home 
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-ppc64el 
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-ppc64el 
--with-arch-directory=ppc64le --with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--enable-objc-gc --enable-secureplt --with-cpu=power8 --enable-
 targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror 
--with-long-double-128 --enable-checking=release --build=powerpc64le-linux-gnu 
--host=powerpc64le-linux-gnu --target=powerpc64le-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) 
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial
$ 


I'll disassemble the binary and see if I can spot anything without the state 
information.

Oh, still waiting on permission to mirror the bug.

== Comment: #23 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-07 
13:39:45 ==
A little more information:

I've been looking at bsonExtractStringField's disassembly.  It appears
that this binary inlines the call to the Canary constructor as well as
the call to _verify.  As evidence, I see the PLT call to glibc's memset:

  8ebb3c:       71 c9 06 48     bl      9584ac
<00000d72.plt_call.memset@@GLIBC_2.17>

And later I see the call to invariantFailed:

  8ebc44:       e9 75 f0 4b     bl      7f322c
<_ZN5mongo15invariantFailedEPKcS1_j+0x8>

So we've answered Steve's initial question about which memset we're
using.  This isn't being inlined by the compiler, but does an out-of-
line dynamic call to the GLIBC_2.17 version.

I'm not sure whether GCC would inline a 1024-byte memset using
__builtin_memset, or just end up calling out the same way, but it might
be worth trying out that replacement, and disassembling
bsonExtractStringField again to see if the PLT call has gone away.

== Comment: #24 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-07 
13:50:04 ==
I forgot to mention that the ensuing code generation to accumulate the checksum 
and test it is completely straightforward and looks correct.  So this looks 
like pretty strong evidence that the problem is in the GLIBC memset 
implementation.

  8ebb3c:       71 c9 06 48     bl      9584ac 
<00000d72.plt_call.memset@@GLIBC_2.17>
  8ebb40:       18 00 41 e8     ld      r2,24(r1)
  8ebb44:       00 04 40 39     li      r10,1024
  8ebb48:       00 00 20 39     li      r9,0
  8ebb4c:       a6 03 49 7d     mtctr   r10
  8ebb50:       00 00 43 89     lbz     r10,0(r3)
  8ebb54:       01 00 63 38     addi    r3,r3,1
  8ebb58:       14 52 29 7d     add     r9,r9,r10
  8ebb5c:       f4 ff 00 42     bdnz    8ebb50 
<_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x80>
  8ebb60:       03 00 40 3d     lis     r10,3
  8ebb64:       00 34 4a 61     ori     r10,r10,13312
  8ebb68:       00 50 a9 7f     cmpd    cr7,r9,r10
  8ebb6c:       c4 00 9e 40     bne     cr7,8ebc30 
<_ZN5mongo22bsonExtractStringFieldERKNS_7BSONObjENS_10StringDataEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x160>

...

  8ebc30:       44 ff 82 3c     addis   r4,r2,-188
  8ebc34:       44 ff 62 3c     addis   r3,r2,-188
  8ebc38:       3a 00 a0 38     li      r5,58
  8ebc3c:       38 aa 84 38     addi    r4,r4,-21960
  8ebc40:       60 aa 63 38     addi    r3,r3,-21920
  8ebc44:       e9 75 f0 4b     bl      7f322c 
<_ZN5mongo15invariantFailedEPKcS1_j+0x8>

== Comment: #28 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-08 
11:02:18 ==
Recording some information from email discussions.

(1) The customer is planning to attempt to use valgrind memcheck.
(2) The const cast problem with the canary has been fixed without changing the 
results.
(3) Prior to that fix, the canary was used on the RHEL system with no 
corruption detected, so this does seem to be Ubuntu-specific.
(4) -std=c++11 is used everywhere.
(5) The core and binary compatibility issues appear to be that they were 
generated on 16.10, not 16.04.  New ones coming.
(6) The canary code now looks like:

+namespace {
+
+class Canary {
+public:
+
+    static constexpr size_t kSize = 2048;
+
+    explicit Canary(volatile unsigned char* const t) noexcept : _t(t) {
+        __builtin_memset(const_cast<unsigned char*>(t), kBits, kSize);
+        _verify();
+    }
+
+    ~Canary() {
+        _verify();
+    }
+
+private:
+    static constexpr uint8_t kBits = 0xCD;
+    static constexpr size_t kChecksum = kSize * size_t(kBits);
+
+    void _verify() const noexcept {
+        invariant(std::accumulate(&_t[0], &_t[kSize], 0UL) == kChecksum);
+    }
+
+    const volatile unsigned char* const _t;
+};
+
+}  // namespace
+

And its application in bsonExtractTypedField looks like:

@@ -47,6 +82,10 @@ Status bsonExtractTypedField(const BSONObj& object,
                              StringData fieldName,
                              BSONType type,
                              BSONElement* outElement) {
+
+    volatile unsigned char* const cookie = static_cast<unsigned char 
*>(alloca(Canary::kSize));
+    const Canary c(cookie);
+
     Status status = bsonExtractField(object, fieldName, outElement);

(7) Steve Munroe investigated memset and he and Andrew are in agreement
that we can rule it out:

I looked at the memset_power8 code (memset is just a IFUNC resolve
stub). and I don't see how this problem is caused by memset_power8.

First some observations:

The canary is allocated with alloca for a large power of 2 (1024 bytes).
Alloca returns quadword aligned memory as required to maintain quadword stack 
alignment.
For this case memset_power8 will quickly jump to the vector store loop 
(quadword x 8) all from the same register (a vector splat of the fill char).

With this code the failure modes could only be:
Overwrite by N*quadwords,
Underwrite by N*quadwords,
A repeated pattern every quadword.

But we are not see this. Also think we are back to a clobber by some
other code.

== Comment: #29 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-08 
11:03:33 ==
>From Andrew, difficulties with Valgrind:

I did try the valgrind repro. However, I'm not able to make valgrind
work:

The first try resulted in lots of "mismatched free/delete" reports,
which is sort of odd, because they all seem to be from within the
standard library:

> valgrind --soname-synonyms=somalloc=NONE --track-origins=yes --leak-check=no 
> ./mongos
==17387== Memcheck, a memory error detector
==17387== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17387== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==17387== Command: ./mongos
==17387==
==17387== Mismatched free() / delete / delete []
==17387==    at 0x4895888: free (in 
/usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==17387==    by 0x59514F: deallocate (new_allocator.h:110)
==17387==    by 0x59514F: deallocate (alloc_traits.h:517)
==17387==    by 0x59514F: _M_deallocate_buckets (hashtable_policy.h:2010)
==17387==    by 0x59514F: _M_deallocate_buckets (hashtable.h:356)
==17387==    by 0x59514F: _M_deallocate_buckets (hashtable.h:361)
==17387==    by 0x59514F: _M_rehash_aux (hashtable.h:1999)
==17387==    by 0x59514F: std::_Hashtable<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true> >::_M_rehash(unsigned long, unsigned long const&) 
(hashtable.h:1953)
==17387==    by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true> >::_M_insert_unique_node(unsigned long, unsigned long, 
std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, mongo::Initializ
 erDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
==17387==    by 0x5954D3: 
std::__detail::_Map_base<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true>, true>::operator[](std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
==17387==    by 0x593693: operator[] (unordered_map.h:668)
==17387==    by 0x593693: 
mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::function<mongo::Status (mongo::InitializerContext*)> const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&) 
(initializer_dependency_graph.cpp:58)
==17387==    by 0x591057: 
mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::function<mongo::Status (mongo::InitializerContext*)> const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&) 
(global_initializer_registerer.cpp:44)
==17387==    by 0x52D46F: __static_initialization_and_destruction_0(int, int) 
[clone .constprop.34] (mongos_options_init.cpp:39)
==17387==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==17387==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==17387==    by 0x4F83337: (below main) (libc-start.c:116)
==17387==  Address 0x5151fb0 is 0 bytes inside a block of size 16 alloc'd
==17387==    at 0x48951D4: operator new[](unsigned long) (in 
/usr/lib/valgrind/vgpreload_memcheck-ppc64le-linux.so)
==17387==    by 0x59328F: allocate (new_allocator.h:104)
==17387==    by 0x59328F: allocate (alloc_traits.h:491)
==17387==    by 0x59328F: 
std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> >, true> > 
>::_M_allocate_buckets(unsigned long) [clone .isra.108] 
(hashtable_policy.h:1996)
==17387==    by 0x595093: _M_allocate_buckets (hashtable.h:347)
==17387==    by 0x595093: _M_rehash_aux (hashtable.h:1974)
==17387==    by 0x595093: std::_Hashtable<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true> >::_M_rehash(unsigned long, unsigned long const&) 
(hashtable.h:1953)
==17387==    by 0x595253: std::_Hashtable<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true> >::_M_insert_unique_node(unsigned long, unsigned long, 
std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, mongo::Initializ
 erDependencyGraph::NodeData>, true>*) (hashtable.h:1600)
==17387==    by 0x5954D3: 
std::__detail::_Map_base<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >, 
std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const, mongo::InitializerDependencyGraph::NodeData>, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
mongo::InitializerDependencyGraph::NodeData> >, std::__detail::_Select1st, 
std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, 
std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, 
false, true>, true>::operator[](std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&) (hashtable_policy.h:600)
==17387==    by 0x59356B: operator[] (unordered_map.h:668)
==17387==    by 0x59356B: 
mongo::InitializerDependencyGraph::addInitializer(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::function<mongo::Status (mongo::InitializerContext*)> const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&) 
(initializer_dependency_graph.cpp:46)
==17387==    by 0x591057: 
mongo::GlobalInitializerRegisterer::GlobalInitializerRegisterer(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::function<mongo::Status (mongo::InitializerContext*)> const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&) 
(global_initializer_registerer.cpp:44)
==17387==    by 0x52D46F: __static_initialization_and_destruction_0(int, int) 
[clone .constprop.34] (mongos_options_init.cpp:39)
==17387==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==17387==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==17387==    by 0x4F83337: (below main) (libc-start.c:116)


So, that is a puzzle. However, I can instruct valgrind to ignore that. But it 
still fails to start, now with something more odd:

$ valgrind --show-mismatched-frees=no --soname-synonyms=somalloc=NONE 
--track-origins=yes --leak-check=no ./mongos
==19834== Memcheck, a memory error detector
==19834== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19834== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==19834== Command: ./mongos
==19834==
MC_(get_otrack_shadow_offset)(ppc64)(off=1688,sz=8)

Memcheck: mc_machine.c:329 (get_otrack_shadow_offset_wrk): the
'impossible' happened.

host stacktrace:
==19834==    at 0x3808D9B8: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x3808DB5F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x3808DCDB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x38078CE3: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x38076FAB: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x380BAA2B: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x381B9BB7: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x380BE19F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x3810D04F: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x3810FFEF: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)
==19834==    by 0x3812BB97: ??? (in /usr/lib/valgrind/memcheck-ppc64le-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 19834)
==19834==    at 0x4F3AC14: __lll_lock_elision (elision-lock.c:60)
==19834==    by 0x4F2BBC7: pthread_mutex_lock (pthread_mutex_lock.c:92)
==19834==    by 0x602753: mongo::DBConnectionPool::DBConnectionPool() 
(connpool.cpp:196)
==19834==    by 0x5319EB: __static_initialization_and_destruction_0 
(global_conn_pool.cpp:35)
==19834==    by 0x5319EB: _GLOBAL__sub_I__ZN5mongo14globalConnPoolE 
(global_conn_pool.cpp:39)
==19834==    by 0x137FED3: __libc_csu_init (in /home/acm/opt/src/mongo/mongos)
==19834==    by 0x4F830A7: generic_start_main.isra.0 (libc-start.c:247)
==19834==    by 0x4F83337: (below main) (libc-start.c:116)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.


I'm not really sure what to make of that, except that I did see some thing die 
in the same place, once or twice (__lll_lock_elision), when running with clang 
ASAN with the stack-use-after-return checking enabled. I wasn't really sure 
what to make of that, but it is interesting that this has turned up twice. I 
presume this is related to hardware lock elision?

Anyway, it doesn't seem like I can get this running with valgrind. Happy
to try again if anyone is aware of a workaround.

== Comment: #30 - William J. Schmidt <wschm...@us.ibm.com> - 2016-11-08 
11:06:00 ==
CCing Carl Love.  Carl, have you seen this sort of interaction between valgrind 
and lock elision before?  (Comment #29, you can ignore the rest of this 
bugzilla for now.)

** Affects: gcc-4.8 (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-148069 severity-critical 
targetmilestone-inin16045

** Tags added: architecture-ppc64le bugnameltc-148069 severity-critical
targetmilestone-inin16045

** Changed in: ubuntu
     Assignee: (unassigned) => Taco Screen team (taco-screen-team)

** Package changed: ubuntu => gcc-4.8 (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1640518

Title:
  MongoDB Memory corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gcc-4.8/+bug/1640518/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to