Alexey Serbin created KUDU-2679: ----------------------------------- Summary: In some scenarios, a Spark Kudu application can be devoid of fresh authn tokens Key: KUDU-2679 URL: https://issues.apache.org/jira/browse/KUDU-2679 Project: Kudu Issue Type: Bug Components: client, security, spark Affects Versions: 1.7.1, 1.8.0, 1.7.0, 1.6.0, 1.5.0, 1.4.0, 1.3.1, 1.3.0 Reporter: Alexey Serbin
When running in {{cluster}} mode, tasks run as a part of Spark Kudu client application can be devoid of getting new (i.e. non-expired) authentication tokens even if they run for a very short time. Essentially, if the driver runs longer than the authn token expiration interval and has a particular pattern of making RPC calls to Kudu masters and tablet servers, all tasks scheduled to run after the authn token expiration interval will be supplied with expired authn tokens, making every task fail. The only way to fix that is restarting the application or dropping long-established connections from the driver to Kudu masters/tservers. Below are some details, explaining why that can happen. Let's assume the following holds true for a Spark Kudu application: * The application is running against a secured Kudu cluster. * The application is running in the {{cluster}} mode. * There are no primary authentication credentials at the machines for the user under which the Spark executors are running (i.e. {{kinit}} hasn't been run at those executor machines for the corresponding user or the Kerberos credentials has already expired there). * The {{--authn_token_validity_seconds}} masters' flag is set to {{X}} seconds (default is 60 * 60 * 24 * 7 seconds, i.e. 7 days). * The {{--rpc_default_keepalive_time_ms}} flag for masters (and tablet servers, if they are involved into the communications between the driver process and the Kudu backend) is set to {{Y}} milliseconds (default is 65000 ms). * The application is running for longer than {{X}} seconds. * The driver process makes requests to Kudu masters at least every {{Y}} milliseconds. * The driver either doesn't make requests to Kudu tablet servers or makes such requests at least every {{Y}} milliseconds to each of the involved tablet servers. * The executors are running tasks that keep connections to tablet servers idle for longer than {{Y}} milliseconds or the driver spawns tasks at an executor after {{Y}} milliseconds since last task has completed by the executor. Essentially, that's about a Spark Kudu application where the driver process keeps once opened connections active and the executors need to open new connections to Kudu tablet servers (and/or masters). Also, the executor machines doesn't have Kerberos credentials for the OS user under which the executor processes are run. In such scenarios, the application's tasks spawned after {{X}} seconds from the application start will fail because of expired authentication tokens, while the driver process will never re-acquire its authn token, keeping the expired token in {{KuduContext}} forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005)