[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927799#action_12927799
 ] 

Ivan Kelly commented on ZOOKEEPER-916:
--------------------------------------

The root cause is that startReceiving is called from both startDelivery and the 
connect callback. This is correct
because there are scenarios where it may be needed in both. The checking of the 
receiving flag is the correct way to
resolve this. 

However, this double call of startReceiving really happens *every* time you use 
the c++ client to receive on a
subscription, so it wasn't clear why we've only seen the problem now. 

Turns out that it only triggers when a partial message is received, so we only 
saw it when the messages were very big.
For example, if the message is 100 bytes, the 100 bytes will arrive to the c++ 
client at the same time
sizeReadCallbackHandler will read the size out of the buffer and since the 
whole message is already in the buffer will
call messageReadCallbackHandler directly instead of calling into asio to read 
more from the buffer. After
messageReadCallbackHandler has finished, there will be 0 bytes in the buffer, 
or another whole message in the buffer,
so the second invocation of sizeReadCallbackHandler will behave correctly. 

However, if the message is bigger than the MTU, then the first 
sizeReadCallbackHandler will run, which reads the
message size from the buffer. It then sees that the remaining bytes in the 
buffer are fewer than the size of the
message, so makes a call to async_read to get more, and the function end. At 
this point the second
sizeReadCallbackHandler runs, sees junk and craps out. 


> Problem receiving messages from subscribed channels in c++ client 
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-916
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: contrib-hedwig
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: ZOOKEEPER-916.patch
>
>
> We see this bug with receiving messages from a subscribed channel.  This 
> problem seems to happen with larger messages.  The flow is to first read at 
> least 4 bytes from the socket channel. Extract the first 4 bytes to get the 
> message size.  If we've read enough data into the buffer already, we're done 
> so invoke the messageReadCallbackHandler passing the channel and message 
> size.  If not, then do an async read for at least the remaining amount of 
> bytes in the message from the socket channel.  When done, invoke the 
> messageReadCallbackHandler.
> The problem seems that when the second async read is done, the same 
> sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. 
>  The result is that we then try to read the first 4 bytes again from the 
> buffer.  This will get a random message size and screw things up.  I'm not 
> sure if it's an incorrect use of the boost asio async_read function or we're 
> doing the boost bind to the callback function incorrectly.
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
> size: 512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, 
> currently in buffer 508 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from 
> channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message 
> size: 599 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, 
> currently in buffer 595 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - 
> DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 
> 134286765 from channel(0x80b7a18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to