cached data between jobs

Eric Walker Tue, 01 Sep 2015 09:54:45 -0700

Hi,

I'm noticing that a 30 minute job that was initially IO-bound may not be
during subsequent runs.  Is there some kind of between-job caching that
happens in Spark or in Linux that outlives jobs and that might be making
subsequent runs faster?  If so, is there a way to avoid the caching in
order to get a better sense of the worst-case scenario?


(It's also possible that I've simply changed something that made things
faster.)

Eric

cached data between jobs

Reply via email to