Rui Fan created FLINK-34336: ------------------------------- Summary: AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang sometimes Key: FLINK-34336 URL: https://issues.apache.org/jira/browse/FLINK-34336 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.19.0 Reporter: Rui Fan Assignee: Rui Fan Fix For: 1.19.0
AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may hang in waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color} h2. Reason: The job has 2 tasks(vertices), after calling updateJobResourceRequirements. The source parallelism isn't changed (It's parallelism) , and the FlatMapper+Sink is changed from parallelism to parallelism2. So we expect the task number should be parallelism + parallelism2 instead of parallelism2. h2. Why it can be passed for now? Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by default. It means, flink job will rescale job 30 seconds after updateJobResourceRequirements is called. So the running tasks are old parallelism when we call waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color} IIUC, it cannot be guaranteed, and it's unexpected. -- This message was sent by Atlassian Jira (v8.20.10#820010)