Hi,
I am proposing an additional to the thrift implementation that enable per call
tracing.
Objective:
Currently, a multi thread thrift service has no tracing capability built in. A
per call tracing works as following:
Assume that we have a thrift service MyService. With a php client, it looks
like
$client = new MyService(...);
// a normal call
$ret = $client->foo();
// enable tracing for the next call
Ttracer::triggerLevel(Ttracer::LEVEL_INFO);
// call again
$ret = $client->foo();
// get trace data
$data = Ttracer::getData();
// dump data
foreach ($data as $d) {
echo $d[0] . “: “ . $d[1] . “\n”;
}
// where $d[0] is a timestamp of a float and precision in usc, and $d[1] is a
string.
On the server side, in the service handler, we sprinkle the following calls
(c++ server):
TTracer::trace(Ttracer::LEVEL_ERROR, “max iteration reached. Bail out.”);
Why we need it:
1. For runtime debugging, profiling.
2. For monitoring.
Background:
While each service are free to add additional parameters to encapsulating the
addtion of tracer flag, and tracer data, The proposed scheme will:
1. completely backward compatible. Old server with new client, new server
with old client are all interoperable.
2. No need to modify existing thrift definition, or change new definition to
accommodate tracing.
3. only change at the protocol level.
4. Almost no overhead.
The proposal takes advantages of the following facts:
1. Thrift doesn’t defined how to encapsulate a function call, specifically,
how function parameters and function results are encapsulated. Thrift
implementation uses struct for the function argument and function result, i.e.
All function arguments are grouped together in generated structs:
$(SERVICE)_$(FUNCNAME)_args, $(SERVICE)_$(FUNCNAME)_pargs for function
arguments, and $(SERVICE)_$(FUNCNAME)_result for function results.
2. a thrift generated struct’s read function skip any field who’s field id it
doesn’t understand so a field injection is safe with services built with older
thrift generator.
3. Each service request is handled by one thread.
The proposal calls for the creation of a Ttracer class that use Thread Local
Storage (TLS) to store tracer flag and tracer data. Tracer flag is injected
into a function argument if tracing is requested on the client side. On the
server side, that trace flag is extracted in struct reader by noticing a well
known field id, 0x1001. Server tracing is enabled. Trace calls are executed
and data collected. When writing function result back to client, if tracing is
enabled, and there is trace data, then trace data is injected into function
result struct using another well known field id: 0x1002.
Proposal:
1. reserve field id larger than 0x1000 for internal use. Field id is of type
i16. It is reasonable to assume that there will be no reasonable function that
has more than 4096 parameters.
2. reserve 0x1001 field id for setting trace level, which is of type i32.
3. reserve 0x1002 field id for trace data, which is of type, in c++ parlance,
vector<pair<double, string> >.
4. in struct reader, add a case statement such that when 0x1001 is
encountered extract trace level on the server side and set TLS storage trace
level.
5. in struct reader, add a case statement such that when 0x1002 is
encountered extract trace data on the client side and store it in a TLS
variable.
6. in struct writer, if the strut is a function result then injecting trace
data into the function result with field id 0x1002
7. in struct writer, if the struct is a function argument then inject trace
level into the function parameter with field id 0x1001.
Overhead:
The overhead of this change should be minimum to none when tracing is not
enabled. The injection and extraction only happens when enabled. Testing for
whether or not tracing is enabled is an int var comparison.
I have completed coding on cpp parser, generator, client, server
implementation, php generator and client (there is no php server) and am
pending reviews.
Please let me know your thoughts.
Thanks,
-john song