I made one such fix to TBufferedTransport (see https://github.com/apache/thrift/pull/973).
The issue in TFramedTransport is that a single buffer is used for Flush() and readFrameHeader() ( https://github.com/apache/thrift/blob/master/lib/go/thrift/framed_transport.go#L37), which causes problems for concurrent access. On Tue, Apr 12, 2016 at 3:03 AM Murat Knecht <[email protected]> wrote: > Thanks for sharing Tyler! > > We're using framed transport, multiplexed over binary protocol. At the > moment, we're on a commit from Nov 23rd. > > https://git1-us-west.apache.org/repos/asf/thrift.git/lib/go/thrift?p=thrift.git;a=log;h=ef3cf819e120cc46ef8e1b35baa07eae3a39126a > > The reader/writer problem: Did you find a way to recognize the error > occurrences, so you knew when to recreate the protocol/transport? > Or is the only guaranteed safe way to create them before each usage? > > I assume this also means two writes at the same time might be a problem? > > Am I wrong in assuming that these are bugs in Go's implementation of > Thrift? Because it would seem to me that concurrent access to Thrift > clients is something that the library should take care of. > > > > On 04/11/2016 10:11 PM, Tyler Treat wrote: > > A few things: what version of the Thrift Go lib are you using (commit > SHA)? > > Also, what protocol and transports? > > > > I've seen similar issues before with Go. One was due to the fact that the > > Go lib uses bufio.Readers and bufio.Writers in a few places which, if an > > error occurs when making calls on them, puts them in a bad state until > they > > are reset (or a new transport/protocol is created). > > > > The other issue I've seen is due to the fact that the framed transport > uses > > the same byte buffer for reads and writes. If, for any reason, reads and > > writes are happening concurrently, this will cause very subtle errors > (e.g. > > incorrect frame size). > > > > On Mon, Apr 11, 2016 at 5:38 AM Murat Knecht < > [email protected]> > > wrote: > > > >> Hello, > >> > >> we've a few Golang services connected with Thrift and are running into a > >> peculiar problem every week or so: Calls to one particular service would > >> fail with: > >> > >> Notenoughframesize0toread4bytes > >> > >> or > >> > >> outofsequenceresponse > >> Incorrectframesize(251658252) > >> > >> The weird thing is that most API endpoints of that service are working > >> fine, can be called and will respond properly. Only two methods are, > >> when they degrade, down for good. Until we restart the process. > >> > >> The one thing that makes these two API endpoints special (apart from the > >> errors): The service calls another Thrift-powered service to retrieve > >> data — which is running in the same process. So, we have two services > >> exposing Thrift endpoints on different ports, but running in the same > >> process (for good, old historic reasons), with one of those services > >> calling the other. > >> > >> Does that ring a bell with anyone familiar with the Thrift-Go > >> implementation? Is this obviously a bad idea? Is there some shared state > >> that would explain why sometimes the framebuffers leak memory / degrade? > >> This happens sporadically (usually Friday night, for the fun of it) so I > >> won't be able to provide a minimal code sample to reproduce this. > >> > >> Any thoughts / ideas? > >> > >> Thanks, > >> Murat > >> > >> > >> > > >
