Hi, I'm noticing that a 30 minute job that was initially IO-bound may not be during subsequent runs. Is there some kind of between-job caching that happens in Spark or in Linux that outlives jobs and that might be making subsequent runs faster? If so, is there a way to avoid the caching in order to get a better sense of the worst-case scenario?
(It's also possible that I've simply changed something that made things faster.) Eric