[
https://issues.apache.org/jira/browse/THRIFT-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Wilson-Brown updated THRIFT-748:
------------------------------------
Description:
If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(),
the child process can terminate the parent processes' connection by deleting
its copy of the parent TSocket.
In particular,
the default setting of lingerOn_ = 1 causes a RST to be sent in close(socket_)
in TSocket->close()
Discussion:
This behaviour is identical to the behaviour of unix sockets when SO_LINGER is
set (implementations vary).
However, the SO_LINGER default for sockets is off not on. This provides
unexpected behaviour in TSocket.
This design choice makes it really difficult to program a Thrift client that
forks other clients in C++, as the first process to call TSocket->close()
terminates all copies of the connection. The processes all have to call
TSocket->setLinger(0,0) or (1,timeout) before deleting the TSocket, closing the
TSocket, or exiting. (This workaround only succeeds with the suggested fix in
[#THRIFT-747] ).
However, the design choice also prevents deadlock/slowdown issues where a
forked process holds open a copy of the parent's Thrift connections. It also
makes close non-blocking, which is ideal in a destructor.
The design choice may also be an attempt to implement the block to send then
close behaviour described in
http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
However, the default linger interval of 0 turns the linger setting into a hard
reset.
And in the absence of linger, the kernel can usually send small thrift messages
by itself.
Options:
* Change the default lingerOn to 0 - rely on the kernel to resend a limited
number of times
* Change the default lingerVal to > 0
- a large value like INT_MAX would match the default connection, send, and
recv 'no timeout' behaviour
TODO:
* Confirm issue on Linux - see attached test code
* Decide if a change to the defaults is needed
* Document workaround after resolution of [#THRIFT-747] - call
TSocket->setLinger(0,0) or (1,timeout) if forking
was:
If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(),
the child process can terminate the parent processes' connection by deleting
its copy of the parent TSocket.
In particular,
the default setting of lingerOn_ = 1 causes a RST to be sent in close(socket_)
in TSocket->close()
Discussion:
This behaviour is identical to the behaviour of unix sockets when SO_LINGER is
set (implementations vary).
However, the SO_LINGER default for sockets is off not on. This provides
unexpected behaviour in TSocket.
This design choice makes it really difficult to program a Thrift client that
forks other clients in C++, as the first process to call TSocket->close()
terminates all copies of the connection. The processes all have to call
TSocket->setLinger(0,0) before deleting the TSocket, closing the TSocket, or
exiting. (This workaround only succeeds with the suggested fix in [#THRIFT-747]
).
However, the design choice also prevents deadlock/slowdown issues where a
forked process holds open a copy of the parent's Thrift connections. It also
makes close non-blocking, which is ideal in a destructor.
Options:
Do we want to change the default? What is linger useful for?
TODO:
* Confirm issue on Linux - see attached test code
* Decide if a code change is needed
* Document workaround after resolution of [#THRIFT-747] - call
TSocket->setLinger(0,0) if forking
Added notes about article at
http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
describing reliable TCP communication
> C++ TSocket default linger setting breaks forked parent process
> ---------------------------------------------------------------
>
> Key: THRIFT-748
> URL: https://issues.apache.org/jira/browse/THRIFT-748
> Project: Thrift
> Issue Type: Bug
> Components: Library (C++)
> Affects Versions: 0.2, 0.3
> Environment: Cygwin 1.7.1 on Windows XP SP3, Thrift 0.2.0 & r760184 &
> Trunk
> Reporter: Tim Wilson-Brown
> Priority: Trivial
> Attachments: thrift_linger_example.cpp
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(),
> the child process can terminate the parent processes' connection by deleting
> its copy of the parent TSocket.
> In particular,
> the default setting of lingerOn_ = 1 causes a RST to be sent in
> close(socket_) in TSocket->close()
> Discussion:
> This behaviour is identical to the behaviour of unix sockets when SO_LINGER
> is set (implementations vary).
> However, the SO_LINGER default for sockets is off not on. This provides
> unexpected behaviour in TSocket.
> This design choice makes it really difficult to program a Thrift client that
> forks other clients in C++, as the first process to call TSocket->close()
> terminates all copies of the connection. The processes all have to call
> TSocket->setLinger(0,0) or (1,timeout) before deleting the TSocket, closing
> the TSocket, or exiting. (This workaround only succeeds with the suggested
> fix in [#THRIFT-747] ).
> However, the design choice also prevents deadlock/slowdown issues where a
> forked process holds open a copy of the parent's Thrift connections. It also
> makes close non-blocking, which is ideal in a destructor.
> The design choice may also be an attempt to implement the block to send then
> close behaviour described in
> http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
> However, the default linger interval of 0 turns the linger setting into a
> hard reset.
> And in the absence of linger, the kernel can usually send small thrift
> messages by itself.
> Options:
> * Change the default lingerOn to 0 - rely on the kernel to resend a limited
> number of times
> * Change the default lingerVal to > 0
> - a large value like INT_MAX would match the default connection, send,
> and recv 'no timeout' behaviour
> TODO:
> * Confirm issue on Linux - see attached test code
> * Decide if a change to the defaults is needed
> * Document workaround after resolution of [#THRIFT-747] - call
> TSocket->setLinger(0,0) or (1,timeout) if forking
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.