OK, I upgraded valgrind to 3.12 on the power machine and I can now get
it to run meaningfully. We are seeing many error reports of the
following form:

[js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+0000 s40019| 
==34604== Thread 50:
[js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+0000 s40019| 
==34604== Invalid read of size 2
[js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+0000 s40019| 
==34604==    at 0x4F2AD20: __lll_unlock_elision (elision-unlock.c:36)
[js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+0000 s40019| 
==34604==    by 0x4F1DB07: __pthread_mutex_unlock_usercnt 
(pthread_mutex_unlock.c:64)
[js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+0000 s40019| 
==34604==    by 0x4F1DB07: pthread_mutex_unlock (pthread_mutex_unlock.c:314)

Or

[js_test:fsm_all_sharded_replication] 2016-11-10T16:20:43.998+0000 s40019| 
==34604== Invalid write of size 2
[js_test:fsm_all_sharded_replication] 2016-11-10T16:20:43.998+0000 s40019| 
==34604==    at 0x4F2AD30: __lll_unlock_elision (elision-unlock.c:37)
[js_test:fsm_all_sharded_replication] 2016-11-10T16:20:43.998+0000 s40019| 
==34604==    by 0x4F1DB07: __pthread_mutex_unlock_usercnt 
(pthread_mutex_unlock.c:64)
[js_test:fsm_all_sharded_replication] 2016-11-10T16:20:43.998+0000 s40019| 
==34604==    by 0x4F1DB07: pthread_mutex_unlock (pthread_mutex_unlock.c:314)
[js_test:fsm_all_sharded_replication] 2016-11-10T16:20:43.999+0000 s40019| 
==34604==    by 0xD803C7: operator()<const 
mongo::executor::TaskExecutor::RemoteCommandCallbackArgs&, long unsigned int&, 
void> (functional:600)


In all cases, the invalid write appears to be a write into a freed block. 
Frequently, the address appears to be aligned 'Address 0x...e'. So, this is 
very interesting.

Another engineer and I took a close look at one of these instances, and
we do not believe there is any way that the mutex could be accessed
after it was deleted.

Is there a way we can disable the libc lock elision code? An environment
variable or other similar setting? We would like to see if we still see
these sorts of reports after disabling lock elision. If so, then it
would almost certainly be a logic error in our code that we are just
missing. On the other hand, if the valgrind reports go away when we
disable lock elision, then it would be evidence that lock elision might
be at fault for the stack corruption we are observing, at which point I
would re-try our original repro.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1640518

Title:
  MongoDB Memory corruption

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gcc-5/+bug/1640518/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to