Alexey Serbin created KUDU-2679:
-----------------------------------

             Summary: In some scenarios, a Spark Kudu application can be devoid 
of fresh authn tokens
                 Key: KUDU-2679
                 URL: https://issues.apache.org/jira/browse/KUDU-2679
             Project: Kudu
          Issue Type: Bug
          Components: client, security, spark
    Affects Versions: 1.7.1, 1.8.0, 1.7.0, 1.6.0, 1.5.0, 1.4.0, 1.3.1, 1.3.0
            Reporter: Alexey Serbin


When running in {{cluster}} mode, tasks run as a part of Spark Kudu client 
application can be devoid of getting new (i.e. non-expired) authentication 
tokens even if they run for a very short time.  Essentially, if the driver runs 
longer than the authn token expiration interval and has a particular pattern of 
making RPC calls to Kudu masters and tablet servers, all tasks scheduled to run 
after the authn token expiration interval will be supplied with expired authn 
tokens, making every task fail.  The only way to fix that is restarting the 
application or dropping long-established connections from the driver to Kudu 
masters/tservers.

Below are some details, explaining why that can happen.

Let's assume the following holds true for a Spark Kudu application:
* The application is running against a secured Kudu cluster.
* The application is running in the {{cluster}} mode.
* There are no primary authentication credentials at the machines for the user 
under which the Spark executors are running (i.e. {{kinit}} hasn't been run at 
those executor machines for the corresponding user or the Kerberos credentials 
has already expired there). 
* The {{--authn_token_validity_seconds}} masters' flag is set to {{X}} seconds 
(default is 60 * 60 * 24 * 7 seconds, i.e. 7 days).
* The {{--rpc_default_keepalive_time_ms}} flag for masters (and tablet servers, 
if they are involved into the communications between the driver process and the 
Kudu backend) is set to {{Y}} milliseconds (default is 65000 ms).
* The application is running for longer than {{X}} seconds.
* The driver process makes requests to Kudu masters at least every {{Y}} 
milliseconds.
* The driver either doesn't make requests to Kudu tablet servers or makes such 
requests at least every {{Y}} milliseconds to each of the involved tablet 
servers.
* The executors are running tasks that keep connections to tablet servers idle 
for longer than {{Y}} milliseconds or the driver spawns tasks at an executor 
after {{Y}} milliseconds since last task has completed by the executor.

Essentially, that's about a Spark Kudu application where the driver process 
keeps once opened connections active and the executors need to open new 
connections to Kudu tablet servers (and/or masters).  Also, the executor 
machines doesn't have Kerberos credentials for the OS user under which the 
executor processes are run.

In such scenarios, the application's tasks spawned after {{X}} seconds from the 
application start will fail because of expired authentication tokens, while the 
driver process will never re-acquire its authn token, keeping the expired token 
in {{KuduContext}} forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to