Jungtaek Lim created SPARK-33440:
------------------------------------

             Summary: Spark schedules on updating delegation token with 0 
interval under some token provider implementation
                 Key: SPARK-33440
                 URL: https://issues.apache.org/jira/browse/SPARK-33440
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.1, 3.1.0
            Reporter: Jungtaek Lim


We got a report from customer that under specific circumstance Spark schedules 
on updating delegation token with 0 interval, ended up with flooding log 
message & massive requests on token handler side.

After investigation, the problem was they have two delegation token identifiers 
which one of token identifier (IDBS3ATokenIdentifier) has the value of "issue 
date" to be 0, whereas another token identifier (DelegationTokenIdentifier) has 
correct value. 

Both are providing the expire time correctly via Token.renew(), and Spark 
assumes issue date is "correct", hence calculating the token expire period as 
(the result of Token.renew() - "issue date").

{code}
20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal 
interval is 1603175657000 for token S3ADelegationToken/IDBroker
20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal 
interval is 86400048 for token HDFS_DELEGATION_TOKEN
{code}

It's safe at least here because Spark picks "minimal" value. The thing is, to 
calculate the next renewal timestamp, Spark tries to add the renewal interval 
with issue date for every token, and pick minimum value, hence "86400048" is 
picked as the next renewal timestamp.

This is "earlier" than now, hence interval to schedule goes to be negative (as 
we apply subtract with now), and Spark applies safeguard to pick the greater 
between 0 and interval, hence 0 is picked up, and schedule updating token 
infinitely. (Schedule is one-time, but the calculation will always lead to the 
negative, so that's effectively immediate schedule.)

We should construct the better consideration of "safe guard", instead of just 
guarding the schedule interval doesn't go to negative.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to