Andrew Kyle Purtell created HBASE-25292:
-------------------------------------------

             Summary: Update InetSocketAddress usage discipline
                 Key: HBASE-25292
                 URL: https://issues.apache.org/jira/browse/HBASE-25292
             Project: HBase
          Issue Type: Bug
          Components: Client, HFile
            Reporter: Andrew Kyle Purtell


We sometimes cache InetSocketAddress in data structures in an attempt to 
optimize away potential nameservice (DNS) lookups. This is, in general, an 
anti-pattern, because once an InetSocketAddress is resolved, resolution is 
never attempted again. The ideal pattern for connect() is ISA instantiation 
just before the connect() call, with no reuse of the ISA instance. For bind() 
we presume the local identity won't change while the process is live so usage 
and caching can be relaxed in that case.

If I can restate my proposal for a usage convention for InetSocketAddress, it 
would be this: Network identities should be bound late. This means addresses 
should be resolved at the last possible moment. Also, network identity mappings 
can change, so our code should not inappropriately cache them; otherwise we 
might miss a change and fail to operate normally.

I have reviewed the code for InetSocketAddress usage and in my opinion 
sometimes we are caching ISA acceptably, and in other cases we are not.

Correct cases:
 * We cache ISA for RPC connections, so we don't potentially do a lookup for 
every Call. However, we resolve the address earlier than we need to. The code 
can be improved by moving resolution to just before where we connect().

Incorrect cases that can be fixed:
 * RPC stubs. Remote clients may be recycled and replaced with new instances 
where the network identities (DNS name to IP address mapping) have changed--. 
HBASE-14544 attempts to work around DNS instability in data centers of years 
past in a way that, in my opinion, is the wrong thing to do in the modern era. 
This is just a technical opinion and not critical to the rest of the proposal. 
That said, I intend to propose a revert of HBASE-14544. Reverting this 
simplifies some code a bit. (If this part of the proposal is controversial it 
can be dropped.) When looking up the IP address of the remote host when 
creating a stub key we also make a key even if the resolution fails. This is 
the wrong thing to do. If we can't resolve the remote address, we can't contact 
the server. Making a stub that can't communicate is pointless. Throw an 
exception instead.
 * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
have to make up a list right away and cache them forever. We can use Address to 
record the list of favored nodes and convert from Address to InetSocketAddress 
on demand (when we go to create the HFile). This will allow us to resolve 
datanode hostnames just before they are needed. In public cloud, kubernetes, 
and or some private datacenter service deployment options, datanode servers may 
have their network identities (DNS name -> IP address mapping) changed over 
time. We can and should avoid inappropriate caching that may cause us to 
indefinitely use an incorrect address when contacting a favored node. 
 * Sometimes we use ISA when Address is just as good. For example, the dead 
servers list. If we are going to pay some attention to ISA usage discipline, 
let's remove the cases where we use ISA as a host and port pair but do not need 
to do so. Address works just as well and doesn't present an opportunity for 
misuse. Another example would be the RPC client concurrentCounterCache.

Incorrect cases that cannot be fixed:
 * hbase-external-blockcache: We have to resolve all of the memcached locations 
up front because the memcached client constructor requires ISA instances. So we 
have to hope that the network identities (DNS name -> IP address mapping) does 
not change for any in the list. This is beyond our control.

While in this area it is trivial to add new client connect metrics for number 
of potential nameservice lookups (whenever we instantiate an ISA) and number of 
failed nameservice lookups (if the instantiated ISA is unresolved).

While in this area I also noticed we often directly access a field in 
ConnectionId where there is also a getter, so good practice is to use the 
getter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to