Re: how to handle network downtime gracefully?

Randy Abernethy Mon, 03 Jul 2017 09:13:38 -0700

Hi Mario,

The simplest form of error recovery (though not necessarily always the most
efficient) in RPC is to disconnect and reconnect. A reasonable starting
place is to write call code that operates within a protected block (e.g. a
"try" block) then when a non application error is thrown, the catch block
optionally disconnects (you may already be disconnected) and attempts to
reconnect and/or retry the call. This is a simple but reliable approach and
once working you can optimize as needed.

It is worth pointing out that RPC (of any kind) is not perfect for large
file transfer. RPC - Remote Procedure Call, is designed to let you invoke
remote functions and retrieve their results. The function call is an atomic
thing, it either completely succeeds or completely fails. "Procedure Call"
also infers some manageable size block of arguments and return values in
most world views. This means that all of the many small and large
architectural decisions made when creating Thrift were predicated on
reasonable sized inputs and outputs (< 1MB ish).

If you try to transfer a file by passing its data as an argument to a
server and the operation fails you make no progress. It may make sense to
use RPC directly as a file transfer scheme for small files where retrying
the entire transfer might be reasonable. For large files though it is
better to create an application level protocol where you pass modest sized
chunks of the file (in the 1MB handle say). This way if a chunk fails you
only re-transmit the chunk rather than the entire file. Also transferring
really large files (1GB+) in one go can overflow (or overtax) buffers on
the client but particularly on the server. Using chunks avoids this issue.
You can easily write a library wrapper for your chunked transfer that
allows clients to make a single call to transfer a large file with many RPC
transfers happening behind the scenes.

There are lots of ways to skin a cat of course. just some thoughts.

Very best,
Randy

On Mon, Jul 3, 2017 at 7:51 AM, Mario Emmenlauer <ma...@emmenlauer.de>
wrote:

>
> How can I gracefully handle network problems? In grpc, I used to
> create the full interface even if the network was down, and later
> when I try to call RPC methods, grpc would hang until it could
> connect. That was quite simple, when the network came back the RPC
> succeeded eventually.
>
>
> What is the most graceful way to handle an unreliable network
> connection in thrift?
>
>
> Background:
> I'm building a cross platform API with Java server and C++ client
> in thrift. I use the binary protocol to send large files. I use two
> transport channels, one that uses SSL to send the login credentials,
> and a second one that may later be used to send large datasets (after
> the login succeeded).
>
> Currently I create the full interface. But if the network is down,
> I get an exception somewhere after creating the secure socket, with
> error "No more data to read".
>
> All the best,
>
>     Mario Emmenlauer
>
>
> --
> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
> D-81669 München                          http://www.biodataanalysis.de/
>

Re: how to handle network downtime gracefully?

Reply via email to