Hi Tian Qiang,
Thanks for the detailed explaination. I have deployed the latest code of
0.96 branch with hbase-11277 applied. I will keep monitoring to see if
there is still problem and the necessarity of hbase-11306.
δΊ 2014/7/15 11:06, Qiang Tian ει:
Hi, below is more details.
the read0 stacktrace you see means reader wants to read something from
client RPC call. in Andrew's test it shows it is in reading RPC request
data (reasonable. since other meta data size is quite small). although
client follows request-receive style, when multiple clients share the
connection(the default case), the synchronization window when writing to
the same channel is quite small. if those request data have big size, there
might be a sudden rush to the transportation layer..might causing RPC
server could not receive the data in time due to congestion control,
without hbase11277, the reader get 0 byte read again and again...
with hbase11277 the problem should be resolved - we get back to complete
non-blocking IO, but it is still worth investigation non-shared connection
under high workload(hbase11306).