Re: [zeromq-dev] libzmq and helgrind
On 20 Aug 2014, at 1:24 , Thomas Rodgers wrote: > Ignore the 4.1 stack smashing issue, that was a brain fart on my part... > > I ran this (again Ubuntu 14.04 gcc4.8), and this time actually changed the > len > 29 bytes (sorry it was late, I had the stupid), got the msg_t::close() > race reported by helgrind. > > I think these are bogus reports, there is a lock free queue implementation, > ypipe_t (uses CAS, which if it were truly broken, would have reared it's head > well before now) that sits between the context's thread (which is running the > decoder, where the msg_t::init_size() call happens) and your receiving > thread (where msg_t::close() happens). The queue imposes a strict 'happens > before' relationship, the receiving side will not call close before the > decoder has fully placed the message on the pipe (it does this by shallow > copy, then calls msg_t::init() to reset it's local copy before handling the > next message). Hi Thomas, thanks for taking the time to look at this, I'm much obliged! I'll add the relevant suppression to my list. I'll keep nibbling away at this as I can find the time. I suspect that what I come up with will be incomplete, but it'll at least be a start. I think it would be awesome to have a solid helgrind suppression file for zmq in the distribution. I'm sure I'm not the only one who would appreciate it. I'll be happy to contribute what I can come up with but, as I said, I'm afraid it'll be incomplete because I'm using only a subset of zmq. Cheers, Michi. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] libzmq and helgrind
Ignore the 4.1 stack smashing issue, that was a brain fart on my part... I ran this (again Ubuntu 14.04 gcc4.8), and this time actually changed the len > 29 bytes (sorry it was late, I had the stupid), got the msg_t::close() race reported by helgrind. I think these are bogus reports, there is a lock free queue implementation, ypipe_t (uses CAS, which if it were truly broken, would have reared it's head well before now) that sits between the context's thread (which is running the decoder, where the msg_t::init_size() call happens) and your receiving thread (where msg_t::close() happens). The queue imposes a strict 'happens before' relationship, the receiving side will not call close before the decoder has fully placed the message on the pipe (it does this by shallow copy, then calls msg_t::init() to reset it's local copy before handling the next message). On Tue, Aug 19, 2014 at 2:03 AM, Michi Henning wrote: > > On 19/08/14 13:28, Thomas Rodgers wrote: > >> ... then, having realizing that it linked against the Ubuntu installed >> default version (libzmq3), I re-ran this against libzmq4 and the current >> libzmq 'trunk'. I get no races/issues in libzmq3 or libzmq4, but 4.1 >> definitely seems to have a problem, in that this test program is triggering >> GCC's stack smashing detector, so seems like a 4.1 specific issue mebe? >> >> If I get some time tomorrow, I will try to figure out why (no obvious >> from a libzmq call in the backtrace). >> > > Thanks heaps for that, I truly appreciate it! > > I'm running with version 4.0.4 here from the Utopic archives, with gcc 4.9. > > I've attached the output I'm getting from helgrind, just in case. I assume > that yours will be pretty much identical. That was with a string length of > 40. Note that I sometimes get a clean run, especially with shorter strings. > I just tried with a length of 30 and, every now and then, helgrind reports > zero errors (maybe one in ten tries). > > Cheers, > > Michi. > > ___ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] libzmq and helgrind
On 19/08/14 13:28, Thomas Rodgers wrote: ... then, having realizing that it linked against the Ubuntu installed default version (libzmq3), I re-ran this against libzmq4 and the current libzmq 'trunk'. I get no races/issues in libzmq3 or libzmq4, but 4.1 definitely seems to have a problem, in that this test program is triggering GCC's stack smashing detector, so seems like a 4.1 specific issue mebe? If I get some time tomorrow, I will try to figure out why (no obvious from a libzmq call in the backtrace). Thanks heaps for that, I truly appreciate it! I'm running with version 4.0.4 here from the Utopic archives, with gcc 4.9. I've attached the output I'm getting from helgrind, just in case. I assume that yours will be pretty much identical. That was with a string length of 40. Note that I sometimes get a clean run, especially with shorter strings. I just tried with a length of 30 and, every now and then, helgrind reports zero errors (maybe one in ten tries). Cheers, Michi. ==10464== Helgrind, a thread error detector ==10464== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al. ==10464== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==10464== Command: ./a.out ==10464== ==10464== ---Thread-Announcement-- ==10464== ==10464== Thread #1 is the program's root thread ==10464== ==10464== ==10464== ==10464== Possible data race during read of size 8 at 0x6317620 by thread #1 ==10464== Locks held: none ==10464==at 0x4E58194: zmq::msg_t::data() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x401FC9: zmq::message_t::data() (zmq.hpp:220) ==10464==by 0x401B7C: main (main.cpp:37) ==10464== ==10464== Address 0x6317620 is 0 bytes inside a block of size 80 alloc'd ==10464==at 0x4C2BF00: malloc (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x4E57E70: zmq::msg_t::init_size(unsigned long) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E7CD54: zmq::v2_decoder_t::one_byte_size_ready() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E7D2C8: zmq::decoder_base_t::decode(unsigned char const*, unsigned long, unsigned long&) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E72988: zmq::stream_engine_t::in_event() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E52BAD: zmq::epoll_t::loop() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E771DF: thread_routine (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x5DDE164: start_thread (pthread_create.c:309) ==10464==by 0x56B64DC: clone (clone.S:111) ==10464== ==10464== ==10464== ==10464== Possible data race during read of size 2 at 0x6317648 by thread #1 ==10464== Locks held: none ==10464==at 0x4C34DE0: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x5159DAF: char* std::string::_S_construct(char const*, char const*, std::allocator const&, std::forward_iterator_tag) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20) ==10464==by 0x5159E2B: std::basic_string, std::allocator >::basic_string(char const*, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20) ==10464==by 0x401B98: main (main.cpp:37) ==10464== ==10464== Address 0x6317648 is 40 bytes inside a block of size 80 alloc'd ==10464==at 0x4C2BF00: malloc (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x4E57E70: zmq::msg_t::init_size(unsigned long) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E7CD54: zmq::v2_decoder_t::one_byte_size_ready() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E7D2C8: zmq::decoder_base_t::decode(unsigned char const*, unsigned long, unsigned long&) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E72988: zmq::stream_engine_t::in_event() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E52BAD: zmq::epoll_t::loop() (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4E771DF: thread_routine (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0) ==10464==by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x5DDE164: start_thread (pthread_create.c:309) ==10464==by 0x56B64DC: clone (clone.S:111) ==10464== ==10464== ==10464== ==10464== Possible data race during read of size 2 at 0x6317658 by thread #1 ==10464== Locks held: none ==10464==at 0x4C34DEE: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==10464==by 0x5159DAF: char* std::string::_S_construct(char const*, char const
Re: [zeromq-dev] libzmq and helgrind
... then, having realizing that it linked against the Ubuntu installed default version (libzmq3), I re-ran this against libzmq4 and the current libzmq 'trunk'. I get no races/issues in libzmq3 or libzmq4, but 4.1 definitely seems to have a problem, in that this test program is triggering GCC's stack smashing detector, so seems like a 4.1 specific issue mebe? If I get some time tomorrow, I will try to figure out why (no obvious from a libzmq call in the backtrace). On Mon, Aug 18, 2014 at 9:53 PM, Thomas Rodgers wrote: > H, I just ran this on my *nix box (Ubuntu 14.04, gcc4.8) with helgrind > and I get no races in zmq::msg_t::close() reported, nor do I see anything > in the implementation of zmq::msg_t::close() that's likely racey in > practice. > > > On Monday, August 18, 2014, Michi Henning wrote: > >> Hi Thomas, >> >> thanks for the quick answer! >> >> > If the message size exceeds the 'vsm' message length limit, it becomes >> an 'lmsg'. This type is indeed heap allocated, and when the enclosing >> message type is copied, the copies share the reference to the underlying >> heap allocation (and any potential data races that come along with that). >> The lifetime is controlled by an atomic refcount. >> > zmq::msg_t::close() should not create a race as it relies on the >> atomic refcount dropping to zero before freeing the heap block. >> > >> > Access to the actual message data() could potentially create a data >> race, but the receiving side should never have access to a message that is >> still being written by a background thread. >> >> OK, so that sounds like the races on data() and memcpy() are bogus. What >> about the close() race though? I wasn't expecting to see that one. >> >> Cheers, >> >> Michi. >> ___ >> zeromq-dev mailing list >> zeromq-dev@lists.zeromq.org >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] libzmq and helgrind
H, I just ran this on my *nix box (Ubuntu 14.04, gcc4.8) with helgrind and I get no races in zmq::msg_t::close() reported, nor do I see anything in the implementation of zmq::msg_t::close() that's likely racey in practice. On Monday, August 18, 2014, Michi Henning wrote: > Hi Thomas, > > thanks for the quick answer! > > > If the message size exceeds the 'vsm' message length limit, it becomes > an 'lmsg'. This type is indeed heap allocated, and when the enclosing > message type is copied, the copies share the reference to the underlying > heap allocation (and any potential data races that come along with that). > The lifetime is controlled by an atomic refcount. > > zmq::msg_t::close() should not create a race as it relies on the atomic > refcount dropping to zero before freeing the heap block. > > > > Access to the actual message data() could potentially create a data > race, but the receiving side should never have access to a message that is > still being written by a background thread. > > OK, so that sounds like the races on data() and memcpy() are bogus. What > about the close() race though? I wasn't expecting to see that one. > > Cheers, > > Michi. > ___ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] libzmq and helgrind
Hi Thomas, thanks for the quick answer! > If the message size exceeds the 'vsm' message length limit, it becomes an > 'lmsg'. This type is indeed heap allocated, and when the enclosing message > type is copied, the copies share the reference to the underlying heap > allocation (and any potential data races that come along with that). The > lifetime is controlled by an atomic refcount. > zmq::msg_t::close() should not create a race as it relies on the atomic > refcount dropping to zero before freeing the heap block. > > Access to the actual message data() could potentially create a data race, but > the receiving side should never have access to a message that is still being > written by a background thread. OK, so that sounds like the races on data() and memcpy() are bogus. What about the close() race though? I wasn't expecting to see that one. Cheers, Michi. ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] libzmq and helgrind
If the message size exceeds the 'vsm' message length limit, it becomes an 'lmsg'. This type is indeed heap allocated, and when the enclosing message type is copied, the copies share the reference to the underlying heap allocation (and any potential data races that come along with that). The lifetime is controlled by an atomic refcount. zmq::msg_t::close() should not create a race as it relies on the atomic refcount dropping to zero before freeing the heap block. Access to the actual message data() could potentially create a data race, but the receiving side should never have access to a message that is still being written by a background thread. On Sun, Aug 17, 2014 at 8:28 PM, Michi Henning wrote: > Hi, > > I'm trying to put together a helgrind suppression file for use with zmq > programs. There is lots of noise coming from helgrind without suppressions > due to the lock-free stuff going on in zmq. > > I have a bunch of suppressions at the moment that get rid of some or the > more common errors. While I was working on this, I came across a few that > are puzzling, and I'm looking for advice whether these are false positives > or real. I've attached a trivial program that communicates between two > threads with REQ/REP, each thread using its own context. > > The server thread just echoes back what it receives, and the client thread > checks that what it received matches what it sent. > > This program works with zero errors when I run it with the attached > suppression file: > > $ valgrind --tool=helgrind --suppressions=suppress ./a.out > ... > ==4671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 790 from 483) > > The client thread sends a string of 29 bytes: > > const int len = 29; // Noise happens if len >= 30 > const std::string payload(len, 'a'); > ... > zmq::message_t request(payload.size()); > memcpy((void *)request.data(), payload.c_str(), payload.size()); > socket.send(request); > > Now, as soon as the length of the string is 30 bytes or more, I get a > large number of potential races reported by helgrind. It looks like the > string contents spill over onto the heap at that point, so the zmq code > takes a different path. > > I'm seeing races in zmq::msg_t::data() and memcpy(), as well as a race in > zmq::msg_t::close(). > > My question is whether these are false positives or not. I'd much > appreciate any input you might have. (I'm not familiar with the zmq code > base, so I'm hoping that someone who knows the code might be able to help.) > > I've attached the program and the suppression file I used. > > Compile with (gcc 4.9): > > c++ -g -std=c++11 main.cpp -lzmq -lpthread > > and run as above. > > Thanks, > > Michi. > > > > ___ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] libzmq and helgrind
Hi, I'm trying to put together a helgrind suppression file for use with zmq programs. There is lots of noise coming from helgrind without suppressions due to the lock-free stuff going on in zmq. I have a bunch of suppressions at the moment that get rid of some or the more common errors. While I was working on this, I came across a few that are puzzling, and I'm looking for advice whether these are false positives or real. I've attached a trivial program that communicates between two threads with REQ/REP, each thread using its own context. The server thread just echoes back what it receives, and the client thread checks that what it received matches what it sent. This program works with zero errors when I run it with the attached suppression file: $ valgrind --tool=helgrind --suppressions=suppress ./a.out ... ==4671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 790 from 483) The client thread sends a string of 29 bytes: const int len = 29; // Noise happens if len >= 30 const std::string payload(len, 'a'); ... zmq::message_t request(payload.size()); memcpy((void *)request.data(), payload.c_str(), payload.size()); socket.send(request); Now, as soon as the length of the string is 30 bytes or more, I get a large number of potential races reported by helgrind. It looks like the string contents spill over onto the heap at that point, so the zmq code takes a different path. I'm seeing races in zmq::msg_t::data() and memcpy(), as well as a race in zmq::msg_t::close(). My question is whether these are false positives or not. I'd much appreciate any input you might have. (I'm not familiar with the zmq code base, so I'm hoping that someone who knows the code might be able to help.) I've attached the program and the suppression file I used. Compile with (gcc 4.9): c++ -g -std=c++11 main.cpp -lzmq -lpthread and run as above. Thanks, Michi. #include #include #include const int linger_time = 2000; void echo_thread() { zmq::context_t context; zmq::socket_t socket(context, ZMQ_REP); socket.setsockopt(ZMQ_LINGER, &linger_time, sizeof(linger_time)); socket.bind("ipc://mysocket"); zmq::message_t msg; socket.recv(&msg); socket.send(msg); } const int len = 29; // Noise happens if len >= 30 const std::string payload(len, 'a'); int main() { std::thread t{echo_thread}; zmq::context_t context; zmq::socket_t socket(context, ZMQ_REQ); socket.setsockopt(ZMQ_LINGER, &linger_time, sizeof(linger_time)); socket.connect("ipc://mysocket"); zmq::message_t request(payload.size()); memcpy((void *)request.data(), payload.c_str(), payload.size()); socket.send(request); zmq::message_t reply; socket.recv(&reply); std::string r((const char*)reply.data(), reply.size()); assert (r == payload); t.join(); } { zmq_epoll_race Helgrind:Race ... fun:_ZN3zmq7epoll_t4loopEv ... } { zmq_connect_race Helgrind:Race fun:_ZNSt6localeC1Ev fun:_ZN3zmq13ipc_address_t9to_stringERSs fun:_ZNK3zmq9address_t9to_stringERSs fun:_ZN3zmq13socket_base_t7connectEPKc ... } { zmq_bind_race Helgrind:Race ... fun:_ZN3zmq14ipc_listener_t11set_addressEPKc fun:_ZN3zmq13socket_base_t4bindEPKc fun:_ZN3zmq8socket_t4bindEPKc ... } { zmq_recv_race Helgrind:Race ... fun:_ZL9s_recvmsgPN3zmq13socket_base_tEP9zmq_msg_ti ... } { zmq_send_race Helgrind:Race ... fun:_ZL9s_sendmsgPN3zmq13socket_base_tEP9zmq_msg_ti ... } { zmq_term_race Helgrind:Race ... fun:zmq_ctx_term ... } ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev