Yangze Guo created FLINK-22505:
----------------------------------

             Summary: Limit the precision of Resource
                 Key: FLINK-22505
                 URL: https://issues.apache.org/jira/browse/FLINK-22505
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.13.0
            Reporter: Yangze Guo


In our internal deployment, we found that a high precision {{CPUResource}} may 
cause the required resource never to be fulfilled. Think about the following 
scenario:
- The {{SlotManager}} receives a slot request with 1.000000000000001 CPU and 
decides to allocate a pending task manager with that resource spec.
- The resource manager starts a task manager and sets the CPU by dynamic 
config. In this step, we cast the {{CPUResource}} to a double value, where the 
precision loss happens.
The task manager will finally register with 1.0 CPU and thus can not deduct any 
pending task manager or fulfill the slot request.

To solve that issue, we proposed to limit the precision of Resource to a safe 
value, e.g. 8, to prevent the precision loss when cast to double.
- For {{CPUResource}}, the supported scale for the CPU is 3 in k8s while in 
Yarn, the CPU should be an integer.
- For {{ExternalResource}}, the value will always be treated as an integer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to