Re: [zeromq-dev] libzmq and helgrind

2014-08-20 Thread Michi Henning

On 20 Aug 2014, at 1:24 , Thomas Rodgers  wrote:

> Ignore the 4.1 stack smashing issue, that was a brain fart on my part...
> 
> I ran this (again Ubuntu 14.04 gcc4.8), and this time actually changed the 
> len > 29 bytes (sorry it was late, I had the stupid), got the msg_t::close() 
> race reported by helgrind.
> 
> I think these are bogus reports, there is a lock free queue implementation, 
> ypipe_t (uses CAS, which if it were truly broken, would have reared it's head 
> well before now) that sits between the context's thread (which is running the 
> decoder, where the msg_t::init_size() call happens) and  your receiving 
> thread (where msg_t::close() happens).  The queue imposes a strict 'happens 
> before' relationship, the receiving side will not call close before the 
> decoder has fully placed the message on the pipe (it does this by shallow 
> copy, then calls msg_t::init() to reset it's local copy before handling the 
> next message).

Hi Thomas,

thanks for taking the time to look at this, I'm much obliged!

I'll add the relevant suppression to my list. I'll keep nibbling away at this 
as I can find the time. I suspect that what I come up with will be incomplete, 
but it'll at least be a start. I think it would be awesome to have a solid 
helgrind suppression file for zmq in the distribution. I'm sure I'm not the 
only one who would appreciate it. I'll be happy to contribute what I can come 
up with but, as I said, I'm afraid it'll be incomplete because I'm using only a 
subset of zmq.

Cheers,

Michi.

___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] libzmq and helgrind

2014-08-19 Thread Thomas Rodgers
Ignore the 4.1 stack smashing issue, that was a brain fart on my part...

I ran this (again Ubuntu 14.04 gcc4.8), and this time actually changed the
len > 29 bytes (sorry it was late, I had the stupid), got the
msg_t::close() race reported by helgrind.

I think these are bogus reports, there is a lock free queue implementation,
ypipe_t (uses CAS, which if it were truly broken, would have reared it's
head well before now) that sits between the context's thread (which is
running the decoder, where the msg_t::init_size() call happens) and  your
receiving thread (where msg_t::close() happens).  The queue imposes a
strict 'happens before' relationship, the receiving side will not call
close before the decoder has fully placed the message on the pipe (it does
this by shallow copy, then calls msg_t::init() to reset it's local copy
before handling the next message).



On Tue, Aug 19, 2014 at 2:03 AM, Michi Henning  wrote:

>
> On 19/08/14 13:28, Thomas Rodgers wrote:
>
>> ... then, having realizing that it linked against the Ubuntu installed
>> default version (libzmq3), I re-ran this against libzmq4 and the current
>> libzmq 'trunk'.  I get no races/issues in libzmq3 or libzmq4, but 4.1
>> definitely seems to have a problem, in that this test program is triggering
>> GCC's stack smashing detector, so seems like a 4.1 specific issue mebe?
>>
>> If I get some time tomorrow, I will try to figure out why (no obvious
>> from a libzmq call in the backtrace).
>>
>
> Thanks heaps for that, I truly appreciate it!
>
> I'm running with version 4.0.4 here from the Utopic archives, with gcc 4.9.
>
> I've attached the output I'm getting from helgrind, just in case. I assume
> that yours will be pretty much identical. That was with a string length of
> 40. Note that I sometimes get a clean run, especially with shorter strings.
> I just tried with a length of 30 and, every now and then, helgrind reports
> zero errors (maybe one in ten tries).
>
> Cheers,
>
> Michi.
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] libzmq and helgrind

2014-08-19 Thread Michi Henning


On 19/08/14 13:28, Thomas Rodgers wrote:
... then, having realizing that it linked against the Ubuntu installed 
default version (libzmq3), I re-ran this against libzmq4 and the 
current libzmq 'trunk'.  I get no races/issues in libzmq3 or libzmq4, 
but 4.1 definitely seems to have a problem, in that this test program 
is triggering GCC's stack smashing detector, so seems like a 4.1 
specific issue mebe?


If I get some time tomorrow, I will try to figure out why (no obvious 
from a libzmq call in the backtrace).


Thanks heaps for that, I truly appreciate it!

I'm running with version 4.0.4 here from the Utopic archives, with gcc 4.9.

I've attached the output I'm getting from helgrind, just in case. I 
assume that yours will be pretty much identical. That was with a string 
length of 40. Note that I sometimes get a clean run, especially with 
shorter strings. I just tried with a length of 30 and, every now and 
then, helgrind reports zero errors (maybe one in ten tries).


Cheers,

Michi.
==10464== Helgrind, a thread error detector
==10464== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
==10464== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==10464== Command: ./a.out
==10464== 
==10464== ---Thread-Announcement--
==10464== 
==10464== Thread #1 is the program's root thread
==10464== 
==10464== 
==10464== 
==10464== Possible data race during read of size 8 at 0x6317620 by thread #1
==10464== Locks held: none
==10464==at 0x4E58194: zmq::msg_t::data() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x401FC9: zmq::message_t::data() (zmq.hpp:220)
==10464==by 0x401B7C: main (main.cpp:37)
==10464== 
==10464== Address 0x6317620 is 0 bytes inside a block of size 80 alloc'd
==10464==at 0x4C2BF00: malloc (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x4E57E70: zmq::msg_t::init_size(unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E7CD54: zmq::v2_decoder_t::one_byte_size_ready() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E7D2C8: 
zmq::decoder_base_t::decode(unsigned char const*, unsigned 
long, unsigned long&) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E72988: zmq::stream_engine_t::in_event() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E52BAD: zmq::epoll_t::loop() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E771DF: thread_routine (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4C30E26: ??? (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x5DDE164: start_thread (pthread_create.c:309)
==10464==by 0x56B64DC: clone (clone.S:111)
==10464== 
==10464== 
==10464== 
==10464== Possible data race during read of size 2 at 0x6317648 by thread #1
==10464== Locks held: none
==10464==at 0x4C34DE0: memcpy@@GLIBC_2.14 (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x5159DAF: char* std::string::_S_construct(char 
const*, char const*, std::allocator const&, std::forward_iterator_tag) 
(in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==10464==by 0x5159E2B: std::basic_string, 
std::allocator >::basic_string(char const*, unsigned long, 
std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==10464==by 0x401B98: main (main.cpp:37)
==10464== 
==10464== Address 0x6317648 is 40 bytes inside a block of size 80 alloc'd
==10464==at 0x4C2BF00: malloc (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x4E57E70: zmq::msg_t::init_size(unsigned long) (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E7CD54: zmq::v2_decoder_t::one_byte_size_ready() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E7D2C8: 
zmq::decoder_base_t::decode(unsigned char const*, unsigned 
long, unsigned long&) (in /usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E72988: zmq::stream_engine_t::in_event() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E52BAD: zmq::epoll_t::loop() (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4E771DF: thread_routine (in 
/usr/lib/x86_64-linux-gnu/libzmq.so.3.1.0)
==10464==by 0x4C30E26: ??? (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x5DDE164: start_thread (pthread_create.c:309)
==10464==by 0x56B64DC: clone (clone.S:111)
==10464== 
==10464== 
==10464== 
==10464== Possible data race during read of size 2 at 0x6317658 by thread #1
==10464== Locks held: none
==10464==at 0x4C34DEE: memcpy@@GLIBC_2.14 (in 
/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==10464==by 0x5159DAF: char* std::string::_S_construct(char 
const*, char const

Re: [zeromq-dev] libzmq and helgrind

2014-08-18 Thread Thomas Rodgers
... then, having realizing that it linked against the Ubuntu installed
default version (libzmq3), I re-ran this against libzmq4 and the current
libzmq 'trunk'.  I get no races/issues in libzmq3 or libzmq4, but 4.1
definitely seems to have a problem, in that this test program is triggering
GCC's stack smashing detector, so seems like a 4.1 specific issue mebe?

If I get some time tomorrow, I will try to figure out why (no obvious from
a libzmq call in the backtrace).


On Mon, Aug 18, 2014 at 9:53 PM, Thomas Rodgers 
wrote:

> H, I just ran this on my *nix box (Ubuntu 14.04, gcc4.8) with helgrind
> and I get no races in zmq::msg_t::close() reported, nor do I see anything
> in the implementation of zmq::msg_t::close() that's likely racey in
> practice.
>
>
> On Monday, August 18, 2014, Michi Henning  wrote:
>
>> Hi Thomas,
>>
>> thanks for the quick answer!
>>
>> > If the message size exceeds the 'vsm' message length limit, it becomes
>> an 'lmsg'.  This type is indeed heap allocated, and when the enclosing
>> message type is copied, the copies share the reference to the underlying
>> heap allocation (and any potential data races that come along with that).
>> The lifetime is controlled by an atomic refcount.
>> >  zmq::msg_t::close() should not create a race as it relies on the
>> atomic refcount dropping to zero before freeing the heap block.
>> >
>> > Access to the actual message data() could potentially create a data
>> race, but the receiving side should never have access to a message that is
>> still being written by a background thread.
>>
>> OK, so that sounds like the races on data() and memcpy() are bogus. What
>> about the close() race though? I wasn't expecting to see that one.
>>
>> Cheers,
>>
>> Michi.
>> ___
>> zeromq-dev mailing list
>> zeromq-dev@lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] libzmq and helgrind

2014-08-18 Thread Thomas Rodgers
H, I just ran this on my *nix box (Ubuntu 14.04, gcc4.8) with helgrind
and I get no races in zmq::msg_t::close() reported, nor do I see anything
in the implementation of zmq::msg_t::close() that's likely racey in
practice.

On Monday, August 18, 2014, Michi Henning  wrote:

> Hi Thomas,
>
> thanks for the quick answer!
>
> > If the message size exceeds the 'vsm' message length limit, it becomes
> an 'lmsg'.  This type is indeed heap allocated, and when the enclosing
> message type is copied, the copies share the reference to the underlying
> heap allocation (and any potential data races that come along with that).
> The lifetime is controlled by an atomic refcount.
> >  zmq::msg_t::close() should not create a race as it relies on the atomic
> refcount dropping to zero before freeing the heap block.
> >
> > Access to the actual message data() could potentially create a data
> race, but the receiving side should never have access to a message that is
> still being written by a background thread.
>
> OK, so that sounds like the races on data() and memcpy() are bogus. What
> about the close() race though? I wasn't expecting to see that one.
>
> Cheers,
>
> Michi.
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] libzmq and helgrind

2014-08-18 Thread Michi Henning
Hi Thomas,

thanks for the quick answer!

> If the message size exceeds the 'vsm' message length limit, it becomes an 
> 'lmsg'.  This type is indeed heap allocated, and when the enclosing message 
> type is copied, the copies share the reference to the underlying heap 
> allocation (and any potential data races that come along with that).  The 
> lifetime is controlled by an atomic refcount.
>  zmq::msg_t::close() should not create a race as it relies on the atomic 
> refcount dropping to zero before freeing the heap block.  
> 
> Access to the actual message data() could potentially create a data race, but 
> the receiving side should never have access to a message that is still being 
> written by a background thread.

OK, so that sounds like the races on data() and memcpy() are bogus. What about 
the close() race though? I wasn't expecting to see that one.

Cheers,

Michi.
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] libzmq and helgrind

2014-08-18 Thread Thomas Rodgers
If the message size exceeds the 'vsm' message length limit, it becomes an
'lmsg'.  This type is indeed heap allocated, and when the enclosing message
type is copied, the copies share the reference to the underlying heap
allocation (and any potential data races that come along with that).  The
lifetime is controlled by an atomic refcount.  zmq::msg_t::close() should
not create a race as it relies on the atomic refcount dropping to zero
before freeing the heap block.

Access to the actual message data() could potentially create a data race,
but the receiving side should never have access to a message that is still
being written by a background thread.




On Sun, Aug 17, 2014 at 8:28 PM, Michi Henning  wrote:

> Hi,
>
> I'm trying to put together a helgrind suppression file for use with zmq
> programs. There is lots of noise coming from helgrind without suppressions
> due to the lock-free stuff going on in zmq.
>
> I have a bunch of suppressions at the moment that get rid of some or the
> more common errors. While I was working on this, I came across a few that
> are puzzling, and I'm looking for advice whether these are false positives
> or real. I've attached a trivial program that communicates between two
> threads with REQ/REP, each thread using its own context.
>
> The server thread just echoes back what it receives, and the client thread
> checks that what it received matches what it sent.
>
> This program works with zero errors when I run it with the attached
> suppression file:
>
> $ valgrind --tool=helgrind --suppressions=suppress ./a.out
> ...
> ==4671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 790 from 483)
>
> The client thread sends a string of 29 bytes:
>
> const int len = 29;   // Noise happens if len >= 30
> const std::string payload(len, 'a');
> ...
> zmq::message_t request(payload.size());
> memcpy((void *)request.data(), payload.c_str(), payload.size());
> socket.send(request);
>
> Now, as soon as the length of the string is 30 bytes or more, I get a
> large number of potential races reported by helgrind. It looks like the
> string contents spill over onto the heap at that point, so the zmq code
> takes a different path.
>
> I'm seeing races in zmq::msg_t::data() and memcpy(), as well as a race in
> zmq::msg_t::close().
>
> My question is whether these are false positives or not. I'd much
> appreciate any input you might have. (I'm not familiar with the zmq code
> base, so I'm hoping that someone who knows the code might be able to help.)
>
> I've attached the program and the suppression file I used.
>
> Compile with (gcc 4.9):
>
> c++ -g -std=c++11 main.cpp -lzmq -lpthread
>
> and run as above.
>
> Thanks,
>
> Michi.
>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


[zeromq-dev] libzmq and helgrind

2014-08-17 Thread Michi Henning

Hi,

I'm trying to put together a helgrind suppression file for use with zmq 
programs. There is lots of noise coming from helgrind without 
suppressions due to the lock-free stuff going on in zmq.


I have a bunch of suppressions at the moment that get rid of some or the 
more common errors. While I was working on this, I came across a few 
that are puzzling, and I'm looking for advice whether these are false 
positives or real. I've attached a trivial program that communicates 
between two threads with REQ/REP, each thread using its own context.


The server thread just echoes back what it receives, and the client 
thread checks that what it received matches what it sent.


This program works with zero errors when I run it with the attached 
suppression file:


$ valgrind --tool=helgrind --suppressions=suppress ./a.out
...
==4671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 790 from 483)

The client thread sends a string of 29 bytes:

const int len = 29;   // Noise happens if len >= 30
const std::string payload(len, 'a');
...
zmq::message_t request(payload.size());
memcpy((void *)request.data(), payload.c_str(), payload.size());
socket.send(request);

Now, as soon as the length of the string is 30 bytes or more, I get a 
large number of potential races reported by helgrind. It looks like the 
string contents spill over onto the heap at that point, so the zmq code 
takes a different path.


I'm seeing races in zmq::msg_t::data() and memcpy(), as well as a race 
in zmq::msg_t::close().


My question is whether these are false positives or not. I'd much 
appreciate any input you might have. (I'm not familiar with the zmq code 
base, so I'm hoping that someone who knows the code might be able to help.)


I've attached the program and the suppression file I used.

Compile with (gcc 4.9):

c++ -g -std=c++11 main.cpp -lzmq -lpthread

and run as above.

Thanks,

Michi.


#include 
#include 
#include 

const int linger_time = 2000;

void echo_thread()
{
zmq::context_t context;
zmq::socket_t socket(context, ZMQ_REP);
socket.setsockopt(ZMQ_LINGER, &linger_time, sizeof(linger_time));
socket.bind("ipc://mysocket");

zmq::message_t msg;
socket.recv(&msg);
socket.send(msg);
}

const int len = 29;   // Noise happens if len >= 30
const std::string payload(len, 'a');

int main()
{
std::thread t{echo_thread};

zmq::context_t context;
zmq::socket_t socket(context, ZMQ_REQ);
socket.setsockopt(ZMQ_LINGER, &linger_time, sizeof(linger_time));
socket.connect("ipc://mysocket");

zmq::message_t request(payload.size());
memcpy((void *)request.data(), payload.c_str(), payload.size());
socket.send(request);

zmq::message_t reply;
socket.recv(&reply);
std::string r((const char*)reply.data(), reply.size());
assert (r == payload);

t.join();
}
{
zmq_epoll_race
Helgrind:Race
...
fun:_ZN3zmq7epoll_t4loopEv
...
}

{
zmq_connect_race
Helgrind:Race
fun:_ZNSt6localeC1Ev
fun:_ZN3zmq13ipc_address_t9to_stringERSs
fun:_ZNK3zmq9address_t9to_stringERSs
fun:_ZN3zmq13socket_base_t7connectEPKc
...
}

{
zmq_bind_race
Helgrind:Race
...
fun:_ZN3zmq14ipc_listener_t11set_addressEPKc
fun:_ZN3zmq13socket_base_t4bindEPKc
fun:_ZN3zmq8socket_t4bindEPKc
...
}

{
zmq_recv_race
Helgrind:Race
...
fun:_ZL9s_recvmsgPN3zmq13socket_base_tEP9zmq_msg_ti
...
}

{
zmq_send_race
Helgrind:Race
...
fun:_ZL9s_sendmsgPN3zmq13socket_base_tEP9zmq_msg_ti
...
}

{
zmq_term_race
Helgrind:Race
...
fun:zmq_ctx_term
...
}
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev