Hello all, I have a naive question regarding how spark uses the executors in a cluster of machines. Imagine the scenario in which I do not know the input size of my data in execution A, so I set Spark to use 20 (out of my 25 nodes, for instance). At the same time, I also launch a second execution B, setting Spark to use 10 nodes for this.
Assuming a huge input size for execution A, which implies an execution time of 30 minutes for example (using all the resources), and a constant execution time for B of 10 minutes, then both executions will last for 40 minutes (I assume that B cannot be launched until 10 resources are completely available, when A finishes). Now, assuming a very small input size for execution A running for 5 minutes in only 2 of the 20 planned resources, I would like execution B to be launched at that time, consuming both executions only 10 minutes (and 12 resources). However, as execution A has set Spark to use 20 resources, execution B has to wait until A has finished, so the total execution time lasts for 15 minutes. Is this right? If so, how can I solve this kind of scenarios? If I am wrong, what would be the correct interpretation for this? Thanks in advance, Best